In [30]:
import getpass
import os
api_key = getpass.getpass()
os.environ["LANGCHAIN_TRACING_V2"] = "false"
os.environ["OPENAI_API_KEY"] = api_key

In [31]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.document_loaders import PyPDFLoader

# bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
# loader = WebBaseLoader(
#     web_paths=("https://arxiv.org/pdf/2406.11288.pdf",),
#     bs_kwargs={"parse_only": bs4_strainer},
# )
loader = PyPDFLoader("2406.11288v1.pdf")
docs = loader.load()

len(docs[0].page_content)

3323

In [32]:
print(docs[0].page_content[:500])

MFC-Bench : Benchmarking Multimodal
Fact-Checking with Large Vision-Language Models
Shengkang Wang1,∗, Hongzhan Lin2,∗, Ziyang Luo2,∗, Zhen Ye3, Guang Chen1,†, Jing Ma2,†
1Beijing University of Posts and Telecommunications2Hong Kong Baptist University
3Hong Kong University of Science and Technology
{wsk, chenguang}@bupt.edu.cn ,{cshzlin, cszyluo, majing}@comp.hkbu.edu.hk
Abstract
Large vision-language models (LVLMs) have significantly improved multimodal
reasoning tasks, such as visual question 


In [33]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

len(all_splits)

115

In [34]:
len(all_splits[0].page_content)

945

In [35]:
all_splits[10].metadata

{'source': '2406.11288v1.pdf', 'page': 1, 'start_index': 4676}

In [36]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
# from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
hgf_key = "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"

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

In [37]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

len(retrieved_docs)

6

In [38]:
print(retrieved_docs[0].page_content)

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


In [39]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()

example_messages

[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: filler question \nContext: filler context \nAnswer:")]

In [40]:
print(example_messages[0].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question 
Context: filler context 
Answer:


In [41]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

In [61]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template("""Use the following pieces of context to answer the question about the paper that user is asking.
If the context doesn't provide enough information, just say that you don't know, don't try to make up an answer.
The question from user can be in Vietnamese or English.
Pay attention to the context of the question rather than just looking for similar keywords in the corpus.
Use three sentences maximum and keep the answer as concise as possible.
Output the answer in the language that user want to see.
Context: 
{context}
Question: {question}
Answer the question in {language}
Helpful Answer:
""")

# prompt = ChatPromptTemplate.from_messages
rag_chain = (
    {"context": retriever | format_docs,"language":lambda x: "Vietnamese", "question": RunnablePassthrough(),}
    | prompt
    | llm
    | StrOutputParser()
)

for chunk in rag_chain.stream("How many criterias in MFC benchmark?"):
    print(chunk, end="", flush=True)

MFC-Bench có ba tiêu chí chính để đánh giá độ chính xác của các mô hình ngôn ngữ hình ảnh lớn (LVLMs): Phân loại Manipulation, Phân loại Out-of-Context và Phân loại Veracity.