# Question Answering Chatbot Version 2 
Goal: Create a Q&A chatbot for fact checking using Langchain and its Agent Executor, OpenAI's agent tool, HuggingFace (or Ollama), and FAISS DB

## RAG Pipeline 1
User type: Commentators, government, businesses

Data: Single Source PDF Files (e.g. Relevant scripts and recordings from past cases.) 

Input1: User Query

Ouput1: Answers any question regarding the context of the data source



Scenarios:
- Political Debates 
- Court/Crime Cases
- Customer Service


Options:
- Out of the box model
- Fine-Tune a model
- Retrain model
- Create model from scratch




### Agent-Based Approach vs Rag Chain-Based Approach

Agent-Based Approach  -> use an openai agent when there are multiple tools to include as data sources e.x. wikipedia, arxiv, websites, pdfs, etc. 
```
from langchain.agents import create_openai_tools_agent

agent = create_openai_tools_agent(llm=chat_ollama , tools=tools, prompt=premade_prompt)

from langchain.agents import AgentExecutor

agent_executor=AgentExecutor(agent=agent, tools=tools ,verbose=True)

agent_executor.invoke({"input": "Hi"})
```

and 


RAG Chain-Based Approach -> an openai agent isn't necessary when there is a single data source 
```
from langchain.chains.combine_documents import create_stuff_documents_chain

docs_chain = create_stuff_documents_chain(llm=llm_model, prompt=prompt)

retriever = db.as_retriever()

from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, docs_chain)

retrieval_chain.invoke({"input": "For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline."})
```


In [1]:
# Data Source - Web
# Data Source - PDF
from langchain.document_loaders import PyPDFDirectoryLoader, WebBaseLoader, WikipediaLoader
# Searches the Wikipedia API
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun
# Wrapper for Wikipedia API to help conduct relevant searches
from langchain_community.utilities.wikipedia import WikipediaAPIWrapper
# Database 
from langchain_community.vectorstores import FAISS
# Text Splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
# LLM 
from langchain_community.llms import Ollama
# Chat Models
from langchain_community.chat_models import ChatOllama
# Hub for additional models and/or prompts
from langchain import hub
# Embedding Model
from langchain_community.embeddings import OllamaEmbeddings
# Prompt
from langchain.prompts import PromptTemplate, ChatMessagePromptTemplate
# Documents Chain
from langchain.chains.combine_documents import create_stuff_documents_chain
# Retriever Chain
from langchain.chains import create_retrieval_chain
# Retrieval Tool
from langchain.tools.retriever import create_retriever_tool
# OpenAI agent tools
# Agent Exector
from langchain.agents import AgentExecutor, create_openai_functions_agent

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [9]:
# Wiki Data Source 
wiki_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=150)
wiki_tool = WikipediaQueryRun(api_wrapper=wiki_wrapper)

In [10]:
# # Web Base Source
# web_loader = WebBaseLoader(web_path="https://lakecountyin.gov/departments/health/Health-First-Indiana/hfi-faq")
# web_tool = web_loader.load()
# web_tool

In [11]:
# PDF Image Source 
pdf_loader = PyPDFDirectoryLoader(path="./pdf_data_source_2/")
docs = pdf_loader.load()
docs

[Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 0}, page_content='Health First Lake County Plan\n2024-2025\n'),
 Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 1}, page_content='Lake County Scorecard\n'),
 Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 2}, page_content='• KPI: Number of counties that through a tobacco prevention and cessation \ncoalition have a comprehensive program to address youth tobacco and \naddictive nicotine prevention.\nLCHD meets KPI\nPLAN AND PARTNERSHIPS: \n• LCHD partnered with the Lake County Tobacco Coalition  (Community \nAdvocates of Northern Indiana and Franciscan Health)\n'),
 Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 3}, page_content='• KPI: Number of counties with documented process to refer families to needed \nservices inc

In [12]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
pdf_docs_split = text_splitter.split_documents(docs)
pdf_docs_split

[Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 0}, page_content='Health First Lake County Plan\n2024-2025'),
 Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 1}, page_content='Lake County Scorecard'),
 Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 2}, page_content='• KPI: Number of counties that through a tobacco prevention and cessation \ncoalition have a comprehensive program to address youth tobacco and \naddictive nicotine prevention.\nLCHD meets KPI\nPLAN AND PARTNERSHIPS: \n• LCHD partnered with the Lake County Tobacco Coalition  (Community \nAdvocates of Northern Indiana and Franciscan Health)'),
 Document(metadata={'source': 'pdf_data_source_2/Health+First+Indiana+Presentation+071224.pdf', 'page': 3}, page_content='• KPI: Number of counties with documented process to refer families to needed \nservices including

In [13]:
llm = ChatOllama(model="llama3")

  llm = ChatOllama(model="llama3")


In [17]:
# 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

# prompt = hub.pull(api_key="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", )
prompt = hub.pull("hwchase17/openai-functions-agent")
from langchain.prompts import ChatPromptTemplate

prompt2 = ChatPromptTemplate.from_template(
    template=""" 
            Answer the question or query to the best of your knowledge.
            Can you also make your answer rhyme if the result if more than six words. 
            <context>
            {context}
            </context>
            ASP: {agent_scratchpad}
            Question: {input}

            """)
prompt2



ChatPromptTemplate(input_variables=['agent_scratchpad', 'context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['agent_scratchpad', 'context', 'input'], input_types={}, partial_variables={}, template=' \n            Answer the question or query to the best of your knowledge.\n            Can you also make your answer rhyme if the result if more than six words. \n            <context>\n            {context}\n            </context>\n            ASP: {agent_scratchpad}\n            Question: {input}\n\n            '), additional_kwargs={})])

In [26]:
# Create a docs tool instead of a chain
# docs_chain = create_stuff_documents_chain(llm, prompt2)
# docs_chain

In [22]:
db = FAISS.from_documents(pdf_docs_split, embedding=OllamaEmbeddings())
db

  db = FAISS.from_documents(pdf_docs_split, embedding=OllamaEmbeddings())


<langchain_community.vectorstores.faiss.FAISS at 0x127887e10>

In [23]:
# Retriever interacts witht the vector store and sends data back to agent
retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x127887e10>, search_kwargs={})

In [24]:
# Docs tools
retriever_tool = create_retriever_tool(
    retriever=retriever,
    name="Health First",
    description="tool for pdf files of Lake County Indiana's Health First Program"
)

In [25]:
tools = [wiki_tool, retriever_tool]
tools

[WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper(wiki_client=<module 'wikipedia' from '/Users/druestaples/.pyenv/versions/3.11.9/envs/langchain_qa_chatbot_env_1/lib/python3.11/site-packages/wikipedia/__init__.py'>, top_k_results=1, lang='en', load_all_available_meta=False, doc_content_chars_max=150)),
 Tool(name='Health First', description="tool for pdf files of Lake County Indiana's Health First Program", args_schema=<class 'langchain_core.tools.retriever.RetrieverInput'>, func=functools.partial(<function _get_relevant_documents at 0x121c43ec0>, retriever=VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x127887e10>, search_kwargs={}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_separator='\n\n'), coroutine=functools.partial(<function _aget_relevant_documents at 0x123cfede0>, retriever=VectorStoreRetriever(tags=['

In [27]:
# Create Agent
agent = create_openai_functions_agent(llm, tools, prompt)
agent

RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_to_openai_function_messages(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.mess

In [30]:
# Allow agent to be invoked
agent_executor = AgentExecutor(agent=agent, tools=tools)
agent_executor

AgentExecutor(verbose=False, agent=RunnableAgent(runnable=RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_to_openai_function_messages(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='

In [31]:
response = agent_executor.invoke({"input": "What is Health First Indiana?"})
response

{'input': 'What is Health First Indiana?',
 'output': 'HealthFirst Indiana (HFI) is a non-profit organization that aims to improve the health and well-being of individuals and communities in Indiana. They provide a range of services, including:\n\n1. Medicaid managed care: HFI operates Medicaid managed care programs for low-income Hoosiers, which means they coordinate healthcare services for members, ensuring they receive necessary medical attention while keeping costs under control.\n2. Behavioral health services: HFI offers mental health and substance abuse treatment to individuals and families, helping them overcome challenges like addiction, depression, anxiety, and other behavioral health issues.\n3. Case management: Their case managers work with individuals who have complex healthcare needs, providing guidance on navigating the healthcare system, accessing resources, and achieving their wellness goals.\n4. Community outreach and education: HFI engages in community-based initiativ

In [32]:
response2 = agent_executor.invoke({"input": "What is a KPI and how many of them are there?"})
response2

{'input': 'What is a KPI and how many of them are there?',
 'output': "A Key Performance Indicator (KPI) is a measurable value that demonstrates how effectively an organization or individual is achieving its goals. KPIs are used to evaluate performance, track progress, and make data-driven decisions.\n\nThe number of KPIs can vary greatly depending on the organization, industry, and specific goals. There is no one-size-fits-all answer to the number of KPIs, as it depends on the complexity of the organization and its objectives.\n\nThat being said, here are some general guidelines:\n\n1. For a small business or startup: 5-10 KPIs\nThese might include metrics such as revenue growth, customer acquisition rate, website traffic, social media engagement, and employee satisfaction.\n2. For a medium-sized business: 10-20 KPIs\nAdditional metrics might include product development time, sales conversion rates, inventory turnover, and marketing ROI.\n3. For a large corporation or enterprise: 20-5

In [33]:
response3 = agent_executor.invoke({"input": "What KPI's were mentioned in Health First Indiana?"})
response3

{'input': "What KPI's were mentioned in Health First Indiana?",
 'output': "According to the case study on Health First Indiana, some of the key performance indicators (KPIs) mentioned include:\n\n1. Patient Engagement: Measuring patients' participation in their care through metrics such as appointment attendance rates and patient activation scores.\n2. Readmissions: Tracking hospital readmission rates to identify areas for improvement and reduce unnecessary rehospitalizations.\n3. Cost per Member: Monitoring the total healthcare costs for each member to ensure that Health First Indiana is providing high-quality care at a reasonable cost.\n4. Quality of Care: Measuring clinical quality through metrics such as HbA1c levels, blood pressure control, and cholesterol management for patients with diabetes or hypertension.\n5. Patient Satisfaction: Tracking patient satisfaction rates through surveys and feedback forms to ensure that members are receiving the care they need and expect.\n6. Acc

In [36]:
response4 = agent_executor.invoke({"input": "what are rates for adult obesity in Lake County and Indiana according the the Health First Indiana file?"})
response4

{'input': 'what are rates for adult obesity in Lake County and Indiana according the the Health First Indiana file?',
 'output': "According to the Health First Indiana File, which provides data on health indicators across different regions of Indiana, including Lake County, here are the rates for adult obesity:\n\n**Lake County:**\n\n* Adult obesity rate (2019-2020): 34.4%\n\t+ Male obesity rate: 37.3%\n\t+ Female obesity rate: 31.6%\n\n**Indiana State:**\n\n* Adult obesity rate (2019-2020): 35.5%\n\t+ Male obesity rate: 38.2%\n\t+ Female obesity rate: 32.8%\n\nPlease note that these rates are based on data from the Centers for Disease Control and Prevention's (CDC) Behavioral Risk Factor Surveillance System (BRFSS). The BRFSS is a continuous, random-digit-dialed telephone survey of adults aged 18 years and older.\n\nIt's worth mentioning that Lake County has a slightly lower obesity rate compared to the state average. However, both rates are above the national average for adult obesit