Now we'll build Data, but as an LLM Agent that can use tools outside of the LLM. In addition to using RAG to access "long term memory" of everything Data ever said in tha past, we'll give him a couple of additional capabilities: performing numeric calculations (which LLM's tend to struggle with on their own,) and access current information via a web search.

We will start by parsing the original scripts and extracting lines spoken by Data. As before, you will need to upload all of the script files into a tng folder within your sample_data folder in your CoLab workspace first.

An archive can be found at https://www.st-minutiae.com/resources/scripts/ (look for "All TNG Epsiodes"), but you could easily adapt this to read scripts from your favorite character from your favorite TV show or movie instead.

We've done all of this before, so I'm combining all of the code to load up Data's past lines into a vector store together below. We'll pick up again as we incorporate this into an agent. If you need an explanation of how we are populating this vector store, refer to the earlier RAG exercises in the course.

Also be sure to provide your own OpenAI secret key. Click on the little key icon in CoLab and add a "secret" for OPENAI_API_KEY that points to your secret key.

In [53]:
!pip install openai --upgrade
!pip install langchain_openai langchain_experimental



In [1]:
import os
import re
import random
import openai

from langchain_classic.indexes import VectorstoreIndexCreator
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_experimental.text_splitter import SemanticChunker

dialogues = []

def strip_parentheses(s: str) -> str:
    """Remove parentheses and their contents from string"""
    return re.sub(r'\(.*?\)', '', s)

def is_single_word_all_caps(s: str) -> bool:
    """Check if string is a single word in all caps (character name)"""
    words = s.split()
    
    if len(words) != 1:
        return False
    
    # Make sure it isn't a line number
    if bool(re.search(r'\d', words[0])):
        return False
    
    return words[0].isupper()

def extract_character_lines(file_path: str, dialoguesLines: list):
    """Extract dialogue lines for a specific character from script file"""
    lines = []
    try:
        with open(file_path, 'r', encoding='utf-8', errors='ignore') as script_file:
            lines = script_file.readlines()
            print(f"lines: {lines[0]}")
    except Exception as e:
        print(f"Error reading {file_path}: {e}")
        return
    
    # is_character_line = False
    # current_line = ''
    # current_character = ''
    
    # for line in lines:
    #     stripped_line = line.strip()
    #     if is_single_word_all_caps(stripped_line):
    #         is_character_line = True
    #         current_character = stripped_line
    #     elif (line.strip() == '') and is_character_line:
    #         is_character_line = False
    #         dialog_line = strip_parentheses(current_line).strip()
    #         dialog_line = dialog_line.replace('"', "'")
    #         if current_character == character_name and len(dialog_line) > 0:
    #             dialogues.append(dialog_line)
    #         current_line = ''
    #     elif is_character_line:
    #         current_line += line.strip() + ' '

    for line in lines:
        stripped_line = line.strip()        
        # dialog_line = strip_parentheses(current_line).strip()
        # dialog_line = dialog_line.replace('"', "'")
        dialogues.append(stripped_line)
        # current_line = ''

def process_directory(directory_path: str) -> list:
    """Process all script files in directory and extract character lines"""
    dialogues = []
    
    if not os.path.exists(directory_path):
        print(f"Warning: Directory {directory_path} does not exist")
    
    for filename in os.listdir(directory_path):
        file_path = os.path.join(directory_path, filename)
        if os.path.isfile(file_path):
            extract_character_lines(file_path, dialogues)
    
    print(f"Extracted {len(dialogues)} dialogue lines for Agent")

process_directory("./sample_data/AgentProcess")

# Access the API key from the environment variable
openai_api_key = os.getenv('OPENAI_API_KEY')
tavily_api_key = os.getenv('TAVILY_API_KEY')

# Initialize the OpenAI API client
openai.api_key = openai_api_key

# Write our extracted lines for Data into a single file, to make
# life easier for langchain.
print(f"dialogues 1: {dialogues[0]}")
with open("./sample_data/AgentProcess/Agent_process.txt", "w+") as f:
    for line in dialogues:
        f.write(line + "\n")


text_splitter = SemanticChunker(OpenAIEmbeddings(openai_api_key=openai_api_key), breakpoint_threshold_type="percentile")
with open("./sample_data/AgentProcess/Agent_process.txt") as f:
  data_lines = f.read()
docs = text_splitter.create_documents([data_lines])

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
index = VectorstoreIndexCreator(embedding=embeddings).from_documents(docs)

lines: Handling insurance claims can often seem overwhelming, but with the right guidance and tools, it doesn’t have to be. At Chenango Brokers, we know the challenges you face and are committed to making the process as smooth and efficient as possible. This guide is designed to provide you with practical insights and strategies to navigate the complex world of insurance claims handling with confidence.

Extracted 0 dialogue lines for Agent
dialogues 1: Handling insurance claims can often seem overwhelming, but with the right guidance and tools, it doesn’t have to be. At Chenango Brokers, we know the challenges you face and are committed to making the process as smooth and efficient as possible. This guide is designed to provide you with practical insights and strategies to navigate the complex world of insurance claims handling with confidence.


  index = VectorstoreIndexCreator(embedding=embeddings).from_documents(docs)


Next we will create a langchain retriever to access this vector store that knows about Data. We're keeping it simple this time, no fancy prompt rewriting or compression.

In [2]:
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(openai_api_key=openai_api_key, temperature=0)

system_prompt = (
    "You are an insurance Sales Agent working for LivEasy Insurance, trying to sell insurance to the customer. "
    "Use the given context to get information about the insurance process, claim process, quote process "
    "Use three sentence maximum and keep the answer concise. "
    "Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

retriever=index.vectorstore.as_retriever(search_kwargs={'k': 10})


In [4]:
# Cell: Inspect chunks
print(f"Total chunks created: {len(docs)}")
for i, doc in enumerate(docs[:3]):  # Show first 3 chunks
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:200])

Total chunks created: 5

--- Chunk 1 ---
Handling insurance claims can often seem overwhelming, but with the right guidance and tools, it doesn’t have to be. At Chenango Brokers, we know the challenges you face and are committed to making th

--- Chunk 2 ---
Your clients rely on you during their most vulnerable moments. By being transparent, empathetic, and proactive, you can build a solid foundation of trust. This trust is vital for client retention and 

--- Chunk 3 ---
Even with the best preparation, challenges can arise. Here are some common issues and how to handle them:
Delays and denials are frustrating for everyone involved. Keep your clients informed about pot


Let's make sure it works:

In [5]:
# retriever.invoke("LiveEasy insurance")[0]

results = retriever.invoke("What information you need for Business insurance?")

if results:
    print("Found results:")
    print(results[0].page_content)
else:
    print("No results found. Try a different query.")
    
    # Try with insurance-related query
    results2 = retriever.invoke("insurance")
    if results2:
        print("\nTrying 'insurance' query:")
        print(results2[0].page_content)

Found results:
waivers & disclaimers? Business Insurance,Vehicle Insurance, What is your credit rating? Business Insurance,Vehicle Insurance, Have you experienced any financial difficulties or bankruptcies? Business Insurance,What is the reason for seeking new insurance coverage? Business Insurance, How often do you conduct safety audits? Business Insurance, Have you experienced any data breaches? What have you done to safeguard data? Business Insurance, Is your work seasonally impacted? Business Insurance,Vehicle Insurance, may also request bank statements.


Now we will create a tool from this retriever that our agent can use. We'll let it know that it can answer questions about Data. Think of this as the agent's "long term memory."

In [10]:
from langchain_classic.tools.retriever import create_retriever_tool

retriever_tool_InsProcess = create_retriever_tool(
    retriever, "Agent_Process",
    "Search for information about Insurance Process, claim process while talking to a user"
)