## 논문 삽입과 질의응답 시스템

문서 올리기

In [3]:
from langchain_community.document_loaders import PyPDFLoader

file_path = "../project/example_data/2408.00714v1.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

# print(len(docs))

RAG로 질의응답

In [2]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI


# llm = ChatOpenAI(model="gpt-4o")
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")


In [4]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

In [27]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

results = rag_chain.invoke({"input": "Give me some key points of the paper."})

results

In [28]:
results['answer']

'The paper acknowledges the contributions of various individuals involved in data support, annotation engineering, management, compute support, and project-level support. There is no plan to validate or verify further annotations for SA-V, but annotations were reviewed for quality assurance during training and production stages. Sociodemographic characteristics were not used to select annotators for masklet annotations, and diversity among crowdworkers was encouraged during video collection. The final labels in the dataset are a result of data cleaning and post-processing from individual annotator responses.'

## LLM앱으로 만들기

In [31]:
print("Welcome to the paperbot. If you want to quit, please enter 'exit'.")
while True:
    # Input
    user_input = input("User: ")

    # 종료 입력하면 대화 종료
    if user_input.lower() == "exit":
        print("Thank you.")
        break

    # 응답 생성 및 출력
    results = rag_chain.invoke({"input": user_input})
    print(f"User: {user_input}")
    print(f"Assitant: {results['answer']}")

Welcome to the . 종료하시려면 '종료'를 입력하세요.
User: What is the subjective?
Assitant: The subjective aspects of the task involve selecting objects to mask and track in a video, which can vary based on annotators' decisions and interpretations. Annotators are assumed to understand the PVS task and receive regular training and feedback to ensure consistency in their annotations. The task's subjectivity stems from the differing perspectives and interpretations of annotators when choosing objects to mask and track in videos.
User: quit
Assitant: I'm here to help. Let me know if you have any questions.
Thank you.


## 챗봇으로 만들기