## **RAG from scratch (Brute Forcing RAG)**

In [1]:
import os 
import glob
from dotenv import load_dotenv
import gradio as gr
from openai import OpenAI

In [3]:
load_dotenv(override=True)

True

In [4]:
openai = OpenAI()
MODEL = 'gpt-4.1-nano'

#### Collecting information from the Knowledge Base

In [14]:
context = {}

employees = glob.glob("knowledge-base/employees/*")

for employee in employees:
    name = employee.split(' ')[-1][:-3]
    doc = ""
    with open(employee, "r", encoding='utf-8') as f:
        doc = f.read()
    context[name] = doc

In [15]:
context.keys()

dict_keys(['Chen', 'Harper', 'Thomson', 'Lancaster', 'Carter', 'Tran', 'Blake', 'Bishop', 'Thompson', 'Spencer', 'Greene', 'Trenton'])

In [16]:
context['Lancaster']

"# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  \n\n- **2010 - 2013**: Business Analyst at Edge Analytics  \n  Prior to joining Innovate, Avery worked as a Business Analyst, focusing on market trends and consumer preferenc

>product.split(os.sep) splits the file path into parts based on folder levels.

>Example: 
```python
"knowledge-base/products/coffee.md" →
["knowledge-base", "products", "coffee.md"]
```

In [17]:
products = glob.glob("knowledge-base/products/*")

for product in products:
    name = product.split(os.sep)[-1][:-3]
    doc = ""
    with open(product, "r", encoding='utf-8') as f:
        doc = f.read()
    context[name] = doc

In [18]:
context.keys()

dict_keys(['Chen', 'Harper', 'Thomson', 'Lancaster', 'Carter', 'Tran', 'Blake', 'Bishop', 'Thompson', 'Spencer', 'Greene', 'Trenton', 'Carllm', 'Homellm', 'Markellm', 'Rellm'])

In [19]:
system_message = "You are an expert answering accurate questions about Insurellm, the Insurance Tech company. Give brief, accurate answers. If you don't know the answers, say so. Do not make anything up if you haven't been provided with relevant context."

#### Brute-Forcing RAG:

In [23]:
def get_relevant_context(message):
    relevant_context = []
    for title, content in context.items():
        if title in message:
            relevant_context.append(content)
    return relevant_context

In [30]:
get_relevant_context("Who is Avery Lancaster and what is carllm?")

["# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  \n\n- **2010 - 2013**: Business Analyst at Edge Analytics  \n  Prior to joining Innovate, Avery worked as a Business Analyst, focusing on market trends and consumer preferen

> it is brutally case sensitive...!!

> Lancaster is the only word due to which we are getting this context. Even if the name is Ivory Lancastesr, we'll also get the same results.

#### Inference Time !!

In [33]:
def add_context(message):
    relevant_context = get_relevant_context(message)
    if relevant_context:
        message += "\n\nThe following addtional context might be relevant in answering this question:\n\n"
        for relevant in relevant_context:
            message += relevant + "\n\n"
    return message

In [36]:
print(add_context("Who is Avery Lancaster?"))

Who is Avery Lancaster?

The following addtional context might be relevant in answering this question:

# Avery Lancaster

## Summary
- **Date of Birth**: March 15, 1985  
- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  
- **Location**: San Francisco, California  

## Insurellm Career Progression
- **2015 - Present**: Co-Founder & CEO  
  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  

- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  
  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  

- **2010 - 2013**: Business Analyst at Edge Analytics  
  Prior to joining Innova

In [42]:
def chat(message, history):
    messages = [{'role': 'system', 'content': system_message}]
    for user_message, assistant_message in history:
        messages.append({'role':'user', 'content': user_message})
        messages.append({'role': 'assistant', 'content': assistant_message})

    message = add_context(message)
    messages.append({'role': 'user', 'content': message})

    stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)

    response=""
    for chunk in stream:
        fragment = chunk.choices[0].delta.content or ""
        response += fragment
        yield response

In [43]:
gr.ChatInterface(fn=chat).launch()

  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7862
* To create a public link, set `share=True` in `launch()`.


