# Case Fact Checker
Goal: Create a Q&A chatbot for fact checking using Langchain, HuggingFace (or Ollama), and FAISS DB

## RAG Pipeline 1
User type: Commentators, government, 

Data: Relevant scripts and recordings from past cases. 

Input1: User Query

Ouput1: Answers any question regarding the current case.


## RAG Pipeline 2
User Type: Debate Candidates, Documented witnesses, suspects, bystanders, etc. 

Data: Relevant scripts and recordings from past cases. 

Input2: User prompt

Output2: Determines what is true, false, and undecide in percentages equating to 100%. Also shows those percentages for each individual sentences. Also it points out any fallacies involved. 


Scenarios:
- Political Debates 
- Court/Crime Cases


Options:
- Out of the box model
- Fine-Tune a model
- Retrain model
- Create model from scratch




### Agent-Based Approach vs Rag Chain-Based Approach

Agent-Based Approach  -> use an openai agent when there are multiple tools to include as data sources e.x. wikipedia, arxiv, websites, pdfs, etc. 
```
from langchain.agents import create_openai_tools_agent

agent = create_openai_tools_agent(llm=chat_ollama , tools=tools, prompt=premade_prompt)

from langchain.agents import AgentExecutor

agent_executor=AgentExecutor(agent=agent, tools=tools ,verbose=True)

agent_executor.invoke({"input": "Hi"})
```

and 


RAG Chain-Based Approach -> an openai agent isn't necessary when there is a single data source 
```
from langchain.chains.combine_documents import create_stuff_documents_chain

docs_chain = create_stuff_documents_chain(llm=llm_model, prompt=prompt)

retriever = db.as_retriever()

from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, docs_chain)

retrieval_chain.invoke({"input": "For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline."})
```


In [22]:
# FAISS DB
from langchain_community.vectorstores import FAISS
# Retrieval Chain = LLM + Prompt
from langchain.chains import create_retrieval_chain, RetrievalQA
# Load PDF Documents
from langchain_community.document_loaders import PyPDFDirectoryLoader
# Embedding Models
from langchain_community.embeddings import OllamaEmbeddings, HuggingFaceEmbeddings
# Large Language Models
from langchain_community.llms import HuggingFaceHub, HuggingFacePipeline, HuggingFaceEndpoint, Ollama
# Prompt Template
from langchain.chains.combine_documents import create_stuff_documents_chain
# Text Splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Prompt Template 
from langchain.prompts import PromptTemplate

In [12]:
# Load in data

pdf_loader = PyPDFDirectoryLoader("./pdf_data_source")
pdf_loader

<langchain_community.document_loaders.pdf.PyPDFDirectoryLoader at 0x127d1d250>

In [13]:
docs = pdf_loader.load()
docs

[Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content="Title: People v. JohnsonScene: A courtroom. The judge, Hon. Patricia Reyes, presides. The defendant, Alex Johnson, is accused of stealing a valuable painting from an art gallery.Characters:•Judge Patricia Reyes (The Judge)•Mr. Brown (Prosecutor)•Ms. Lee (Defense Attorney)•Alex Johnson (Defendant)•Court ClerkCourt Clerk: All rise. The court is now in session, the Honorable Judge Patricia Reyes presiding.Judge: Please be seated. We are here today in the matter of People v. Johnson, case number 34987. Mr. Johnson, you stand accused of theft of a painting valued at $250,000 from the Blue Horizon Gallery. How do you plead?Defendant: Not guilty, Your Honor.Judge: Very well. Mr. Brown, you may present your opening statement.Mr. Brown (Prosecutor): Thank you, Your Honor. Ladies and gentlemen of the jury, the evidence will show that on the evening of April 12, Mr. Johnson was seen on security foot

In [55]:
# Split Text

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_splitter

<langchain_text_splitters.character.RecursiveCharacterTextSplitter at 0x14340af10>

In [56]:
split_docs = text_splitter.split_documents(docs)
split_docs

[Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content='Title: People v. JohnsonScene: A courtroom. The judge, Hon. Patricia Reyes, presides. The defendant,'),
 Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content='The defendant, Alex Johnson, is accused of stealing a valuable painting from an art'),
 Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content='from an art gallery.Characters:•Judge Patricia Reyes (The Judge)•Mr. Brown (Prosecutor)•Ms. Lee'),
 Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content='Lee (Defense Attorney)•Alex Johnson (Defendant)•Court ClerkCourt Clerk: All rise. The court is now'),
 Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content='The court is now in session, the Honorable Judge Patricia Reyes presiding.Judge: Please be seated.'),
 Document(m

In [57]:
# Initiate Vector Database
db = FAISS.from_documents(split_docs, embedding=OllamaEmbeddings())
db

<langchain_community.vectorstores.faiss.FAISS at 0x1377b4050>

In [58]:
# Initiate LLM
llm = Ollama()
llm

Ollama()

In [80]:
# Initiate Prompt

# prompt = PromptTemplate.from_template(
#                         """
#                         Goal: Give a clear and simple answer and explanation.
#                         Each response should answer the question intially along with bullet points that list out the reasonings.
#                         Each output response should be 20 words or less.
#                         <context>{context}</context>
#                         User type: Commentators, government officials
#                         Example:
#                             Input: {input}
#                             Output: Answers any question regarding the current case.            - 
#                         """)





# prompt = PromptTemplate.from_template(
#     """Given the following script content, please answer the question in a concise, list format. Each point in the list should contain a short, clear response to help make the answer easy to read. Focus on specific details from the script content to directly address the question.

#         <context>:
#         {context}
#         </context>

#         Question:
#         {input}

#         Example Format for Responses:

#         - [First key point]
#         - [Second key point]
#         - [Third key point]
#         - [Additional points, as needed]
#         Make sure each list item is brief but captures essential details to answer the question accurately. Separate distinct ideas into individual list items to improve readability.
#      """)   




        
prompt = PromptTemplate.from_template(  
        """
        
        Based on the script content provided, answer the question below in a short, list format. Each answer should be a single, brief point with only the essential details. Avoid unnecessary context or extra description.

        <context>:
        {context}
        </context>

        Question:
        {input}

        Response Format:
        - (e.g., Name or role)
        - (e.g., Specific action or location)
        - (e.g., Short description)

        Example Usage:
        Question:
        “Where was the painting taken from?”

        Response:
        - The painting was stolen from an art gallery.
        - The thief, Ms. Lee, took the painting for personal profit. 
        - Judge Patricia Reyes sustained a motion to suppress Mr. Tate's last statement regarding the painting's origins.        
        """)

prompt

PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="\n        \n        Based on the script content provided, answer the question below in a short, list format. Each answer should be a single, brief point with only the essential details. Avoid unnecessary context or extra description.\n\n        <context>:\n        {context}\n        </context>\n\n        Question:\n        {input}\n\n        Response Format:\n        - (e.g., Name or role)\n        - (e.g., Specific action or location)\n        - (e.g., Short description)\n\n        Example Usage:\n        Question:\n        “Where was the painting taken from?”\n\n        Response:\n        - The painting was stolen from an art gallery.\n        - The thief, Ms. Lee, took the painting for personal profit. \n        - Judge Patricia Reyes sustained a motion to suppress Mr. Tate's last statement regarding the painting's origins.        \n        ")

In [81]:
# Establish docs chain to include split documents and embedding model
docs_chain = create_stuff_documents_chain(llm, prompt)
docs_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="\n        \n        Based on the script content provided, answer the question below in a short, list format. Each answer should be a single, brief point with only the essential details. Avoid unnecessary context or extra description.\n\n        <context>:\n        {context}\n        </context>\n\n        Question:\n        {input}\n\n        Response Format:\n        - (e.g., Name or role)\n        - (e.g., Specific action or location)\n        - (e.g., Short description)\n\n        Example Usage:\n        Question:\n        “Where was the painting taken from?”\n\n        Response:\n        - The painting was stolen from an art gallery.\n        - The thief, Ms. Lee, took the painting for personal profit. \

In [82]:
# Create a retreiver

retriever = db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x1377b4050>, search_kwargs={})

In [83]:
# Create retrieval chain 
retrieval_chain = create_retrieval_chain(retriever, docs_chain)
retrieval_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x1377b4050>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="\n        \n        Based on the script content provided, answer the question below in a short, list format. Each answer should be a single, brief point with only the essential details. Avoid unnecessary context or extra description.\n\n        <context>:\n        {context}\n 

In [91]:
# Invoke the regtrieval chain to output a response
response = retrieval_chain.invoke({"input": "Who was all involved in the incident?"})
response

{'input': 'Who was all involved in the incident?',
 'context': [Document(metadata={'source': 'pdf_data_source/court_case_script_2.pdf', 'page': 1}, page_content='Reyes exits the courtroom.)'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_2.pdf', 'page': 0}, page_content='toolkit after he ﬁnished.Ms. Lee: And did you notice anything unusual that evening?Jordan Tate:'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_2.pdf', 'page': 1}, page_content='Sustained. The jury will disregard Mr. Tate’s last statement. Ms. Lee, let’s keep it relevant.Ms.'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 1}, page_content='Johnson as the last person seen leaving the building with a large, wrapped object.Ms. Lee:')],
 'answer': '\n• Jordan Tate\n• Ms. Johnson\n• Judge Patricia Reyes'}

In [92]:
response["answer"]

'\n• Jordan Tate\n• Ms. Johnson\n• Judge Patricia Reyes'

In [94]:
response["answer"].split("\n")

['', '• Jordan Tate', '• Ms. Johnson', '• Judge Patricia Reyes']

In [95]:
response = retrieval_chain.invoke({"input": "What was the date, time, and location of the crime?"})
response

{'input': 'What was the date, time, and location of the crime?',
 'context': [Document(metadata={'source': 'pdf_data_source/court_case_script_2.pdf', 'page': 1}, page_content='Reyes exits the courtroom.)'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 1}, page_content='Johnson as the last person seen leaving the building with a large, wrapped object.Ms. Lee:'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 1}, page_content='footage.Judge: Very well. You may step down, Ofﬁcer. Ms. Lee, you may call your witness.'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_2.pdf', 'page': 0}, page_content='toolkit after he ﬁnished.Ms. Lee: And did you notice anything unusual that evening?Jordan Tate:')],
 'answer': '\nQuestion: What was the date, time, and location of the crime?\n\nResponse:\n\n* Date: Not specified in the provided script content.\n* Time: Not specified in the provided script content.\n* Loc

In [96]:
response["answer"].split("\n")

['',
 'Question: What was the date, time, and location of the crime?',
 '',
 'Response:',
 '',
 '* Date: Not specified in the provided script content.',
 '* Time: Not specified in the provided script content.',
 '* Location: The crime occurred inside the courtroom where the trial was being held.']

In [97]:
response = retrieval_chain.invoke({"input": "Where was the painting stolen?"})
response

{'input': 'Where was the painting stolen?',
 'context': [Document(metadata={'source': 'pdf_data_source/court_case_script_2.pdf', 'page': 1}, page_content='Reyes exits the courtroom.)'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_2.pdf', 'page': 1}, page_content='Sustained. The jury will disregard Mr. Tate’s last statement. Ms. Lee, let’s keep it relevant.Ms.'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content='this painting for personal proﬁt.Judge: Thank you, Mr. Brown. Ms. Lee, your opening statement?Ms.'),
  Document(metadata={'source': 'pdf_data_source/court_case_script_1.pdf', 'page': 0}, page_content='from an art gallery.Characters:•Judge Patricia Reyes (The Judge)•Mr. Brown (Prosecutor)•Ms. Lee')],
 'answer': "\nAnswer:\n\n* Where was the painting stolen from? - Art gallery.\n* What action did Ms. Lee take with the painting? - Took it for personal profit.\n* Who sustained a motion to suppress a statement re

In [98]:
response["answer"].split("\n")

['',
 'Answer:',
 '',
 '* Where was the painting stolen from? - Art gallery.',
 '* What action did Ms. Lee take with the painting? - Took it for personal profit.',
 "* Who sustained a motion to suppress a statement regarding the painting's origins? - Judge Patricia Reyes."]

In [101]:
response = retrieval_chain.invoke({"input": "Where there any witnesses involved in the crime of the stolen art piece?"})
response["answer"].split("\n")

['',
 'Answer:',
 '',
 '* Name or role: Jordan Tate',
 '* Specific action or location: exited the courtroom',
 "* Short description: Reyes sustained a motion to suppress Mr. Tate's last statement regarding the stolen art piece."]