<a href="https://colab.research.google.com/github/1uch0/LLM_Udemy_course/blob/main/RAG__Extracting_information_from_companies_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Expert Knowledge Worker

### A question answering agent that is an expert knowledge worker
### To be used by employees of Insurellm, an Insurance Tech company
### The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

This first implementation will use a simple, brute-force type of RAG..

### Sidenote: Business applications of this week's projects

RAG is perhaps the most immediately applicable technique of anything that we cover in the course! In fact, there are commercial products that do precisely what we build this week: nuanced querying across large databases of information, such as company contracts or product specs. RAG gives you a quick-to-market, low cost mechanism for adapting an LLM to your business area.

In [None]:
#!pip install gradio
#!pip install anthropic

Collecting anthropic
  Downloading anthropic-0.49.0-py3-none-any.whl.metadata (24 kB)
Downloading anthropic-0.49.0-py3-none-any.whl (243 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.4/243.4 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: anthropic
Successfully installed anthropic-0.49.0


In [None]:
# imports

import os
import sys
import io
import glob
import requests
from bs4 import BeautifulSoup
from typing import List
#from dotenv import load_dotenv
from openai import OpenAI
import google.generativeai as fenai
#import anthropic
from google.colab import userdata
from openai import OpenAI
import gradio as gr
import json
import anthropic
import subprocess

In [24]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# price is a factor for our company, so we're going to use a low cost model

MODEL = "gpt-4o-mini"

In [None]:
api_key = userdata.get('Open_AI_API') ##Open_AI_API It is the API key from OPEN_AI saved in your collab
openai = OpenAI(api_key=api_key)
openai_api_key = api_key

claude = anthropic.Anthropic()

OPENAI_MODEL = "gpt-4o-mini"
CLAUDE_MODEL = "claude-3-haiku-20240307"

In [49]:
# With massive thanks to student Dr John S. for fixing a bug in the below for Windows users!

context = {}

employees = glob.glob("/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/*")

for employee in employees:
    name = os.path.splitext(os.path.basename(employee))[0]  #Takes the name of the file
    doc = ""
    with open(employee, "r", encoding="utf-8") as f:
        doc = f.read()
    context[name]=doc #open the files

In [50]:
print(employees)

['/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Alex Thomson.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Samantha Greene.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Alex Harper.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Jordan K. Bishop.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Avery Lancaster.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Jordan Blake.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Oliver Spencer.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Alex Chen.md', '/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/employees/Samuel Trenton.md', '/content/drive/MyDrive/Colab Noteb

In [55]:
context["Alex Thomson"]   #Keys are the last names of the employees

'# HR Record\n\n# Alex Thomson\n\n## Summary\n- **Date of Birth:** March 15, 1995  \n- **Job Title:** Sales Development Representative (SDR)  \n- **Location:** Austin, Texas  \n\n## Insurellm Career Progression\n- **November 2022** - Joined Insurellm as a Sales Development Representative. Alex Thomson quickly adapted to the team, demonstrating exceptional communication and rapport-building skills.\n- **January 2023** - Promoted to Team Lead for special projects due to Alex\'s initiative in driving B2B customer outreach programs.  \n- **August 2023** - Developed a training module for new SDRs at Insurellm, enhancing onboarding processes based on feedback and strategies that Alex Thomson pioneered.  \n- **Current** - Continues to excel in the role, leading a small team of 5 SDRs while collaborating closely with the marketing department to identify new lead-generation strategies.  \n\n## Annual Performance History  \n- **2022** - Rated as "Exceeds Expectations." Alex Thomson achieved 150%

In [56]:
products = glob.glob("/content/drive/MyDrive/Colab Notebooks/LLM Engineering course/knowledge-base/products/*")

for product in products:
    name = product.split(os.sep)[-1][:-3]
    doc = ""
    with open(product, "r", encoding="utf-8") as f:
        doc = f.read()
    context[name]=doc

In [57]:
context.keys()

dict_keys(['Alex Thomson', 'Samantha Greene', 'Alex Harper', 'Jordan K. Bishop', 'Avery Lancaster', 'Jordan Blake', 'Oliver Spencer', 'Alex Chen', 'Samuel Trenton', 'Emily Tran', 'Emily Carter', 'Maxine Thompson', 'Carllm', 'Rellm', 'Homellm', 'Markellm'])

In [33]:
system_message = "You are an expert in answering accurate questions about Insurellm, the Insurance Tech company. Give brief, accurate answers. If you don't know the answer, say so. Do not make anything up if you haven't been provided with relevant context."

In [58]:
#Function that takes a message, any kind of message and interacto trhout the context and it is going to see if the product, name exist in context


def get_relevant_context(message):
    relevant_context = []
    for context_title, context_details in context.items():
        if context_title.lower() in message.lower():
            relevant_context.append(context_details)
    return relevant_context

In [63]:
get_relevant_context("Who is Alex Thomson?")

[]

In [68]:
get_relevant_context("Who is Avery and what is carllm?")

['# Product Summary\n\n# Carllm\n\n## Summary\n\nCarllm is an innovative auto insurance product developed by Insurellm, designed to streamline the way insurance companies offer coverage to their customers. Powered by cutting-edge artificial intelligence, Carllm utilizes advanced algorithms to deliver personalized auto insurance solutions, ensuring optimal coverage while minimizing costs. With a robust infrastructure that supports both B2B and B2C customers, Carllm redefines the auto insurance landscape and empowers insurance providers to enhance customer satisfaction and retention.\n\n## Features\n\n- **AI-Powered Risk Assessment**: Carllm leverages artificial intelligence to analyze driver behavior, vehicle conditions, and historical claims data. This enables insurers to make informed decisions and set competitive premiums that reflect true risk profiles.\n\n- **Instant Quoting**: With Carllm, insurance companies can offer near-instant quotes to customers, enhancing the customer exper

In [65]:
def add_context(message):
    relevant_context = get_relevant_context(message)
    if relevant_context:
        message += "\n\nThe following additional context might be relevant in answering this question:\n\n"
        for relevant in relevant_context:
            message += relevant + "\n\n"
    return message

In [69]:
print(add_context("Who is Alex Thomson?"))

Who is Alex Thomson?

The following additional context might be relevant in answering this question:

# HR Record

# Alex Thomson

## Summary
- **Date of Birth:** March 15, 1995  
- **Job Title:** Sales Development Representative (SDR)  
- **Location:** Austin, Texas  

## Insurellm Career Progression
- **November 2022** - Joined Insurellm as a Sales Development Representative. Alex Thomson quickly adapted to the team, demonstrating exceptional communication and rapport-building skills.
- **January 2023** - Promoted to Team Lead for special projects due to Alex's initiative in driving B2B customer outreach programs.  
- **August 2023** - Developed a training module for new SDRs at Insurellm, enhancing onboarding processes based on feedback and strategies that Alex Thomson pioneered.  
- **Current** - Continues to excel in the role, leading a small team of 5 SDRs while collaborating closely with the marketing department to identify new lead-generation strategies.  

## Annual Performanc

In [70]:
def chat(message, history):
    messages = [{"role": "system", "content": system_message}] + history
    message = add_context(message)
    messages.append({"role": "user", "content": message})

    stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)

    response = ""
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        yield response

## Now we will bring this up in Gradio using the Chat interface -

A quick and easy way to prototype a chat with an LLM

In [71]:
view = gr.ChatInterface(chat, type="messages").launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://cecf0b8cd35f698d37.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
