In [1]:
import logging
import sys
import os
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

In [6]:
import qdrant_client
from qdrant_client import models
from llama_index.core import SimpleDirectoryReader
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.llms.groq import Groq

In [7]:
data = SimpleDirectoryReader('data').load_data()

In [8]:
len(data)

647

In [9]:
data[100]

Document(id_='7d3ab9b2-919c-498a-9b9d-32ba1228d2bc', embedding=None, metadata={'page_label': '67', 'file_name': 'ComputerScienceOne.pdf', 'file_path': 'c:\\Users\\Dell\\Documents\\GitHub\\MultiModal-RAG\\faster rag\\data\\ComputerScienceOne.pdf', 'file_type': 'application/pdf', 'file_size': 2300943, 'creation_date': '2025-07-28', 'last_modified_date': '2025-07-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text_resource=MediaResource(embeddings=None, data=None, text='3.1. Logical Operators\nPsuedocode Code Meaning Type\n< < less than relational\n> > greater than relational\n≤ <= less than or equal to relational\n≥ >= greater than or equal to relational\n= == equal to equality\n̸= != not equal 

In [10]:
data[100].text

'3.1. Logical Operators\nPsuedocode Code Meaning Type\n< < less than relational\n> > greater than relational\n≤ <= less than or equal to relational\n≥ >= greater than or equal to relational\n= == equal to equality\n̸= != not equal to equality\nTable 3.1.: Comparison Operators\nComparisons can also be used with more complex expressions such as\n√\nb2 −4ac <0\nwhich could commonly be expressed in code as\nsqrt(b*b - 4*a*c) < 0\nObserve that both operands could be constants, such as 5 ≤10 but there would be little\npoint. Since both are constants, the truth value of the expression is already determined\nbefore the program runs. Such an expression could easily be replaced with a simple true\nor false variable. These are referred to as tautologies and contradictions respectively.\nWe’ll examine them in more detail below.\nPitfalls\nSometimes you may want to check that a variable falls within a certain range. For\nexample, we may want to test that xlies in the interval [0,10] (between 0 and 

In [11]:
texts = [doc.text for doc in data]

In [12]:
texts[100]

'3.1. Logical Operators\nPsuedocode Code Meaning Type\n< < less than relational\n> > greater than relational\n≤ <= less than or equal to relational\n≥ >= greater than or equal to relational\n= == equal to equality\n̸= != not equal to equality\nTable 3.1.: Comparison Operators\nComparisons can also be used with more complex expressions such as\n√\nb2 −4ac <0\nwhich could commonly be expressed in code as\nsqrt(b*b - 4*a*c) < 0\nObserve that both operands could be constants, such as 5 ≤10 but there would be little\npoint. Since both are constants, the truth value of the expression is already determined\nbefore the program runs. Such an expression could easily be replaced with a simple true\nor false variable. These are referred to as tautologies and contradictions respectively.\nWe’ll examine them in more detail below.\nPitfalls\nSometimes you may want to check that a variable falls within a certain range. For\nexample, we may want to test that xlies in the interval [0,10] (between 0 and 

In [13]:
len(texts)

647

In [14]:
llm = Groq(model="deepseek-r1-distill-llama-70b",api_key=os.getenv("GROQ_API_KEY"))

In [15]:
response = llm.complete("What is the meaning of life").text

In [16]:
result = response.split("</think>", 1)[1].strip()

In [17]:
result

"The meaning of life is a deeply personal and multifaceted question that varies depending on individual beliefs, experiences, and cultural contexts. Here is a structured summary of the exploration:\n\n1. **Subjectivity and Individualism**: The meaning of life is subjective and can differ for each person. It is shaped by unique experiences and beliefs, allowing individuals to create their own purpose.\n\n2. **Philosophical Perspectives**: \n   - **Existentialism**: Suggests that life has no inherent meaning, and individuals must create their own purpose.\n   - **Religious Views**: Often provide a meaning tied to serving a higher power or following specific teachings.\n\n3. **Scientific and Evolutionary Views**: From an evolutionary standpoint, life's meaning might be linked to survival and genetic continuation, though this perspective may overlook emotional and social aspects.\n\n4. **Psychological Theories**: \n   - **Maslow's Hierarchy**: Emphasizes self-actualization and realizing on

In [18]:
embed_model = FastEmbedEmbedding(model_name="thenlper/gte-base")

In [19]:
client = qdrant_client.QdrantClient(
    #location=":memory:",
    url = os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
    prefer_grpc=True
)   

In [20]:
collection_name = "ComputerScienceOne"

In [21]:
if not client.collection_exists(collection_name=collection_name):
    client.create_collection(
        collection_name=collection_name,
        vectors_config=models.VectorParams(
            size=768,
            distance=models.Distance.COSINE,
            on_disk=True
        ),
        quantization_config=models.BinaryQuantization(
            binary = models.BinaryQuantizationConfig(
                always_ram=True
            )
        ),
    )

else:
    print("Collection already exists")

In [22]:
embedding = []
BATCH_SIZE = 50

In [23]:
for page in range(0, len(texts), BATCH_SIZE):
    page_content = texts[page:page+BATCH_SIZE]
    response = embed_model.get_text_embedding_batch(page_content)
    embedding.extend(response)

In [24]:
for idx in range(0, len(texts), BATCH_SIZE):
    docs = texts[idx:idx+BATCH_SIZE]
    embeds = embedding[idx:idx+BATCH_SIZE]

    client.upload_collection(collection_name=collection_name,
                             vectors=embeds,
                             payload = [{"context": context} for context in docs]
                            )

In [25]:
client.update_collection(collection_name=collection_name,
                        optimizer_config=models.OptimizersConfigDiff(
                            indexing_threshold=20000 
                        )
                        )

True

In [26]:
def search(query, k=5):
    query_embedding = embed_model.get_text_embedding(query)
    result = client.query_points(
        collection_name=collection_name,
        query=query_embedding,
        limit=k
    )
    return result

In [27]:
relevant_docs = search("What is the concept of recursion, and how does a recursive function work?")

In [28]:
print(relevant_docs.points[0].payload["context"])

46. Recursion
PHP supports recursion with no special syntax necessary. However, recursion is generally
expensive and iterative or other non-recursive solutions are generally preferred. We
present a few examples to demonstrate how to write recursive functions in PHP.
The ﬁrst example of a recursive function we gave was the toy count down example. In
PHP it could be implemented as follows.
1 function countDown($n) {
2 if($n===0) {
3 printf("Happy New Year!\n");
4 } else {
5 printf("%d\n", $n);
6 countDown($n-1);
7 }
8 }
As another example that actually does something useful, consider the following recursive
summation function that takes an array, its size and an index variable. The recursion
works as follows: if the index variable has reached the size of the array, it stops and returns
zero (the base case). Otherwise, it makes a recursive call to recSum(), incrementing
the index variable by 1. When the function returns, it adds the result to the i-th element
in the array. To invoke this 

In [30]:
from llama_index.core import ChatPromptTemplate
from llama_index.core.llms import ChatMessage,MessageRole

message_template = [ 
    ChatMessage(
        content="""
        You are a helpful assistant. You are given a question and some context. Your task is to answer the question based on the context provided.
        """,
        role=MessageRole.SYSTEM
    ),
    ChatMessage(
        content="""
        Provided Context information below
        {context_str}

        ------------------
        Given this context, answer the following question:
        {query}

        ------------------
        If the question is not from the provided context, say "I don't know. Not enough information to answer the question."
        """,
        role=MessageRole.USER
    )
]

In [33]:
def pipeline(query):

    # Retrieval
    relevant_docuemnts = search(query)
    context = [doc.payload["context"] for doc in relevant_docuemnts.points]
    context = "\n".join(context)

    # Augmentation
    chat_template = ChatPromptTemplate(message_templates=message_template)

    # Generation
    response = llm.complete(
        chat_template.format(
            context_str=context,
            query=query
        )
    )

    formatted_response = response.text.split("</think>", 1)[-1].strip()
    return formatted_response


In [34]:
print(pipeline("What is the concept of recursion, and how does a recursive function work?"))

Recursion is a programming technique where a function invokes itself to solve a problem. It breaks the problem into smaller, more manageable subproblems, each of which is an instance of the same problem. The function calls itself with modified parameters, continuing this process until it reaches a base case that stops the recursion. 

A recursive function operates by:
1. **Base Case**: A condition that, when met, stops the recursion. It provides the simplest form of the problem that can be solved directly.
2. **Recursive Step**: The function calls itself with a modified argument, reducing the problem size or altering it towards the base case.

For example, in a countdown function, the base case is when the counter reaches zero, printing a message. Each recursive step decrements the counter and calls the function again. Similarly, the Fibonacci sequence uses recursion to compute each number as the sum of the two preceding ones, with base cases for the first two numbers.

Recursion effec

In [35]:
print(pipeline("What are the different types of variables in programming as discussed in the material?"))

The different types of variables in programming, as discussed in the material, can be categorized into two main types: primitive data types and the classification based on typing.

### Primitive Data Types:
1. **Byte**: An 8-bit signed two's complement integer.
2. **Short**: A 16-bit signed two's complement integer.
3. **Int**: A 32-bit signed two's complement integer.
4. **Long**: A 64-bit signed two's complement integer.
5. **Float**: A 32-bit IEEE 754 floating-point number.
6. **Double**: A 64-bit floating-point number.
7. **Boolean**: Can be set to true or false.
8. **Char**: A 16-bit Unicode (UTF-16) character, often treated as an integer representing an ASCII value.

### Variable Classification by Typing:
1. **Statically Typed Variables**: 
   - Variables must be declared with a specific type before use.
   - Examples include Java, C, and C++. These languages enforce type safety at compile time.

2. **Dynamically Typed Variables**:
   - The type is determined by the value assigne