## evalutaion notebook codes
임의로 문장 템플릿을 생성하여 llm answer의 value들을 넣어서 만든 문장을 비교하여 평가하는 코드입니다.  
`question1` 과 `question2`에 대해서만 진행했습니다.  
`question3` 와 `questino4`의 경우 수정될 수 있어 추후 진행하겠습니다. (금방할 수 있습니다)  

In [None]:
# !pip install transformers
# !pip install rouge-score
# !pip install sentence_transformers

In [8]:

import os
from langchain_core.vectorstores.base import VectorStoreRetriever
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langgraph.graph import END, StateGraph
from langgraph.checkpoint.memory import MemorySaver
from langchain_core.documents.base import Document
from langchain_core.output_parsers import JsonOutputParser
from langchain_teddynote.messages import messages_to_history
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field
# from langchain_teddynote.evaluator import GroundednessChecker

from dotenv import load_dotenv
from langchain_core.runnables import RunnableConfig
from langchain_teddynote.messages import random_uuid

import yaml
import json

import pandas as pd
from metrics.metric_eval import evaluate_all_metrics

In [9]:
# .env 파일 로드
load_dotenv(dotenv_path=".env")

# API 키 가져오기
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
LANGCHAIN_API_KEY = os.getenv("LANGCHAIN_API_KEY")

# LangSmith 추적 기능을 활성화합니다. (선택적)
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Multi-agent Collaboration"

In [10]:
def load_question(question_path:str="./config/questions/250115-SY-question.yaml"):
    with open(question_path, 'r') as file:
        questions = yaml.safe_load(file)
    
    question_list = []
    for i in range(1, 5):
        if i == 3 or i == 4:
           temp_question = f"""
{questions["main_question"]}{questions[f"add_question{i}"]}
{json.dumps(questions[f"example{i}"], ensure_ascii=False, indent=4)}
""" 
        else: 
            temp_question = f"""
{questions["main_question"]}
{json.dumps(questions[f"example{i}"], ensure_ascii=False, indent=4)}
"""

        question_list.append(temp_question)        

    return question_list

In [11]:
question_list = load_question()

In [12]:
print(question_list[0])


Below are instructions for filling out items by referring to the examples.
The values shown to the right of the colon (":") are examples;
please delete them after reviewing and rewrite them with the values found in the PDF.
If any item is not mentioned in the PDF, do not remove it—write "None."

[
    {
        "CAM (Cathode Active Material)": {
            "Stoichiometry information": {
                "NCM-622": {
                    "Li ratio": "1",
                    "Ni ratio": "0.6",
                    "Co ratio": "0.2",
                    "Mn ratio": "0.2",
                    "O ratio": "2"
                },
                "ZrO2-NCM-622 (Z622)": {
                    "Li ratio": "0.98",
                    "Ni ratio": "0.6",
                    "Co ratio": "0.2",
                    "Mn ratio": "0.2",
                    "O ratio": "2"
                }
            },
            "Whether or not commercial NCM was used for each sample (Stoichiometry information in order)"

### tools.py

In [13]:
def embedding_file(file_folder: str, file_name: str) -> VectorStoreRetriever:
    """문서를 청크 단위로 분할하고 임베딩 모델(text-embedding-ada-002)을 통해 임베딩하여 vector store에 저장합니다. 이후 vector store를 기반으로 검색하는 객체를 생성합니다.

    Args:
        file (str): pdf 문서 경로

    Returns:
        VectorStoreRetriever: 검색기
    """
    ## 긴 텍스트를 작은 청크로 나누는 데 사용되는 클래스
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=500,         ## 최대 청크 길이 정의
        chunk_overlap=100,      ## 청크 간 겹침 길이 정의
        separators=["\n\n"]     ## 텍스트를 나눌 때 사용할 구분자를 지정 (문단)
    )

    ## PDF 파일 불러오기
    loader = PyPDFLoader(f"{file_folder}/{file_name}.pdf")
    docs = loader.load_and_split(text_splitter=splitter)

    ## Embedding 생성 및 vector store에 저장
    embeddings = OpenAIEmbeddings()
    vector_store = FAISS.from_documents(
        documents=docs,         ## 벡터 저장소에 추가할 문서 리스트
        embedding=embeddings    ## 사용할 임베딩 함수
    )

    ## 검색기로 변환: 현재 벡터 저장소를 기반으로 VectorStoreRetriever 객체를 생성하는 기능을 제공
    retriever = vector_store.as_retriever(
        search_type="similarity",    ## 어떻게 검색할 것인지? default가 유사도
        search_kwargs={"k": 10}
    )

    return retriever

### graph.py

In [14]:
# GraphState 상태 정의
class GraphState(TypedDict):
    question: Annotated[str, "Question"]  # 질문
    context: Annotated[str, "Context"]  # 문서의 검색 결과
    answer: Annotated[str, "Answer"]  # 답변
    messages: Annotated[list, add_messages]  # 메시지(누적되는 list)

In [15]:
# Graph 구축
class DataExtractor:
    def __init__(
        self, 
        file_folder:str="./data/input_data", 
        file_number:int=1, 
        # db_folder:str="./vectordb"
    ):
        if file_number < 10:
            file_name = f"paper_00{file_number}"
        elif file_number < 100:
            file_name = f"paper_0{file_number}"
        else:
            file_name = f"paper_{file_number}"

        self.retriever = embedding_file(
            file_folder=file_folder, 
            file_name=file_name, 
            # db_folder=db_folder
        )
        
        self.model = ChatOpenAI(model_name="gpt-4o", temperature=0.5)
        self.relevance_checker = ChatOpenAI(model="gpt-4o", temperature=0.5)
        self.llm_answer_prompt = """
        Based on the following document, please provide an answer to the given question.
        Document:
        {context}

        Question:
        {question}

        Answer:
        """
        self.relevance_check_template = """
        You are a grader assessing relevance of a retrieved document to a user question. \n 
        Here is the retrieved document: \n\n {context} \n\n
        Here is the answer: {answer} \n
        If the document contains keyword(s) or semantic meaning related to the user answer, grade it as relevant. \n
        
        Give a binary score 'yes' or 'no' score to indicate whether the retrieved document is relevant to the answer.
        If the retrieved document does not contain the values or information being searched for, and 'None' is provided as the answer, check if the response accurately reflects the absence of the requested information. If the absence is accurate and justified, grade the document as relevant even if the values are 'None'.
        """
        
        # 그래프 생성
        bulider = StateGraph(GraphState)

        # 노드 정의
        bulider.add_node("retrieve", self.retrieve_document)
        bulider.add_node("relevance_check", self.relevance_check)
        bulider.add_node("llm_answer", self.llm_answer)

       # 엣지 정의
        bulider.add_edge("retrieve", "llm_answer")  # _start_ -> 검색 시작
        bulider.add_edge("llm_answer", "relevance_check")  # 답변 생성 -> 관련성 체크

        # 조건부 엣지를 추가합니다.
        bulider.add_conditional_edges(
            "relevance_check",  # 관련성 체크 노드에서 나온 결과를 is_relevant 함수에 전달합니다.
            self.is_relevant,
            {
                "yes": END,  # 관련성이 있으면 _end_로 이동합니다.
                "no": "retrieve",  # 관련성이 없으면 다시 검색합니다.
            },
        )

        # 그래프 진입점 설정
        bulider.set_entry_point("retrieve")
        
        # 체크포인터 설정
        memory = MemorySaver()

        # 컴파일
        self.graph = bulider.compile(checkpointer=memory)        
        self.graph.get_graph().draw_mermaid_png(output_file_path="graph.png")

    
    def format_docs(self, docs: list[Document]) -> str:
        """문시 리스트에서 텍스트를 추출하여 하나의 문자로 합치는 기능을 합니다.

        Args:
            docs (list[Document]): 여러 개의 Documnet 객체로 이루어진 리스트

        Returns:
            str: 모든 문서의 텍스트가 하나로 합쳐진 문자열을 반환
        """
        return "\n\n".join(doc.page_content for doc in docs)
    
    
    def retrieve_document(self, state: GraphState) -> GraphState:
        """문서에서 검색하여 질문과 관련성 있는 문서를 찾습니다.

        Args:
            state (GraphState): 질문을 상태에서 가져옵니다.

        Returns:
            GraphState: 검색된 문서를 context 키에 저장한 상태 변수
        """        
        # 질문을 상태에서 가져옵니다.
        latest_question = state["question"]

        # 문서에서 검색하여 관련성 있는 문서를 찾습니다.
        retrieved_docs = self.retriever.invoke(latest_question)

        # 검색된 문서를 형식화합니다.(프롬프트 입력으로 넣어주기 위함)
        retrieved_docs = self.format_docs(retrieved_docs)

        # 검색된 문서를 context 키에 저장합니다.
        return GraphState(context=retrieved_docs)
    
    
    def llm_answer(self, state: GraphState) -> GraphState:
        """프롬프트에 따라 LLM이 질문에 대한 답변을 출력합니다. 

        Args:
            state (GraphState): 질문, 검색된 문서를 상태에서 가져옵니다. 

        Returns:
            GraphState: json 형태로 생성된 답변, (유저의 질문, 답변) 메세지를 저장한 상태 변수
        """        
        # 질문을 상태에서 가져옵니다.
        latest_question = state["question"]

        # 검색된 문서를 상태에서 가져옵니다.
        context = state["context"]

        # prompt 설정
        prompt = PromptTemplate(
            template=self.llm_answer_prompt,
            input_variables=["context", "question"],
            )

        # 체인 호출
        chain = prompt | self.model | JsonOutputParser()

        response = chain.invoke(
            {
                "question": latest_question,
                "context": context,
                "chat_history": messages_to_history(state["messages"]),
            }
        )

        # 생성된 답변, (유저의 질문, 답변) 메시지를 상태에 저장합니다.
        return GraphState(
            answer=response,
            messages=[("user", latest_question), ("assistant", response)]
        )


    def relevance_check(self, state: GraphState) -> GraphState:
        """답변과 검색 문서 간의 관련성을 평가합니다. 

        Args:
            state (GraphState): 검색된 문서와 답변을 가져옵니다. 

        Returns:
            GraphState: 관련성 점수를 저장한 상태 변수
        """    
        
        class GradeAnswer(BaseModel):
            """Binary scoring to evaluate the appropriateness of answers to retrieval"""

            binary_score: str = Field(
                description="Indicate 'yes' or 'no' whether the answer solves the question"
            )
            
        # 프롬프트 생성
        prompt = PromptTemplate(
            template=self.relevance_check_template,
            input_variables=["context", "answer"],
        )

        # 체인
        structured_relevance_checker = self.relevance_checker.with_structured_output(GradeAnswer)
        relevance_chain = prompt | structured_relevance_checker
        
        # retrieval_answer_relevant = GroundednessChecker(
        #     llm=self.relevance_checker, target="retrieval-answer"
        # ).create()

        # 관련성 체크를 실행("yes" or "no")
        response = relevance_chain.invoke(
            {"context": state["context"], "answer": state["answer"]}
        )

        print(f"        RELEVANCE CHECK : {response.binary_score}")

        # 참고: 여기서의 관련성 평가기는 각자의 Prompt 를 사용하여 수정할 수 있습니다. 여러분들의 Groundedness Check 를 만들어 사용해 보세요!
        return GraphState(relevance=response.binary_score)


    def is_relevant(self, state: GraphState) -> GraphState:
        """관련성을 체크하는 함수

        Args:
            state (GraphState):

        Returns:
            GraphState: 관련성을 저장한 상태 변수
        """        
        return state["relevance"]

### output

In [16]:
def eval_df_generator(
    file_folder="./data/input_data/",      ## input 논문이 있는 폴더 위치
    file_num_list=[11, 16, 22],            ## 아직 8번 논문 안됨
    question_number=1,                     ## question 번호 1,2,3,4 중 하나
):
    eval_df = pd.DataFrame(columns=["paper_number", "question", "ground_truth", "answer", "context"])
    for i, file_number in enumerate(file_num_list):
        print(f"#####   {file_number}번째 논문    #####")
        voltai_graph = DataExtractor(
            file_folder=file_folder,
            file_number=file_number
        ).graph

        # config 설정(재귀 최대 횟수, thread_id)
        config = RunnableConfig(
            recursion_limit=20, 
            # configurable={"thread_id": str(uuid.uuid4())}
            configurable={"thread_id": random_uuid()}
        )

        # 4개의 질문에 대해 그래프 실행 및 출력
        print(f"    {question_number}번째 질문")
        result = voltai_graph.invoke(input={"question":question_list[question_number-1]}, config=config)

        ### result 저장
        ## gt 불러오기기
        if file_number < 10:
            file_name = f"paper_00{file_number}"
        elif file_number < 100:
            file_name = f"paper_0{file_number}"
        else:
            file_name = f"paper_{file_number}"
    
        # JSON 파일 경로
        file_path = f"./data/ground_truth/{file_name}_gt.json"

        # JSON 파일 읽기
        with open(file_path, "r", encoding="utf-8") as file:
            gt = json.load(file)

        ## df에 결과 저장하기       
        if question_number == 1:
            category_name = "CAM (Cathode Active Material)"
        elif question_number == 2:
            category_name = "Electrode (only for coin-cell (half-cell))"
        elif question_number == 3:
            category_name = "Morphological results"
        elif question_number == 4:
            category_name = "Cathode Performance"    
        else:
            raise
        
        eval_df.loc[i, "paper_number"] = file_number
        eval_df.loc[i, "question"] = result["question"].replace("\n", "")
        eval_df.loc[i, "answer"] = [result["answer"][0][category_name]]
        eval_df.loc[i, "ground_truth"] = [gt[category_name]]
        eval_df.loc[i, "context"] = result["context"]
        
    return eval_df

### eval

In [17]:
file_num_list = [11, 16, 22]
# question_number = 2

In [18]:
q1_eval_frame = eval_df_generator(
    file_folder="./data/input_data/",       ## input 논문이 있는 폴더 위치
    file_num_list=file_num_list,            ## 아직 8번 논문 안됨
    question_number=1,        ## question 번호 1,2,3,4 중 하나
)

#####   11번째 논문    #####
    1번째 질문
        RELEVANCE CHECK : yes
#####   16번째 논문    #####
    1번째 질문
        RELEVANCE CHECK : yes
#####   22번째 논문    #####
    1번째 질문
        RELEVANCE CHECK : yes


In [19]:
q2_eval_frame = eval_df_generator(
    file_folder="./data/input_data/",       ## input 논문이 있는 폴더 위치
    file_num_list=file_num_list,            ## 아직 8번 논문 안됨
    question_number=2,        ## question 번호 1,2,3,4 중 하나
)

#####   11번째 논문    #####
    2번째 질문
        RELEVANCE CHECK : yes
#####   16번째 논문    #####
    2번째 질문
        RELEVANCE CHECK : no
        RELEVANCE CHECK : yes
#####   22번째 논문    #####
    2번째 질문
        RELEVANCE CHECK : yes


In [20]:
# q3_eval_frame = eval_df_generator(
#     file_folder="./data/input_data/",       ## input 논문이 있는 폴더 위치
#     file_num_list=file_num_list,            ## 아직 8번 논문 안됨
#     question_number=3,        ## question 번호 1,2,3,4 중 하나
# )

In [21]:
# q4_eval_frame = eval_df_generator(
#     file_folder="./data/input_data/",       ## input 논문이 있는 폴더 위치
#     file_num_list=file_num_list,            ## 아직 8번 논문 안됨
#     question_number=4,        ## question 번호 1,2,3,4 중 하나
# )

In [22]:
q1_eval_frame

Unnamed: 0,paper_number,question,ground_truth,answer,context
0,11,Below are instructions for filling out items b...,[{'Stoichiometry information': {'LiNi1/3Co1/3M...,[{'Stoichiometry information': {'NCM-111': {'L...,kinetics of the intercalation/deintercalation ...
1,16,Below are instructions for filling out items b...,[{'Stoichiometry information': {'LiNi1/3Co1/3M...,[{'Stoichiometry information': {'NCM': {'Li ra...,was conducted on the electrochemical equipment...
2,22,Below are instructions for filling out items b...,[{'Stoichiometry information': {'LiNi0.84Co0.1...,[{'Stoichiometry information': {'NCM-622': {'L...,2Scientific RepoRts | (2019) 9:8952...


In [23]:
q2_eval_frame

Unnamed: 0,paper_number,question,ground_truth,answer,context
0,11,Below are instructions for filling out items b...,[{'Active material to Conductive additive to B...,[{'Active material to Conductive additive to B...,or to utilize conductive additives such as gra...
1,16,Below are instructions for filling out items b...,[{'Active material to Conductive additive to B...,[{'Active material to Conductive additive to B...,Notes\nThe authors declare no competingﬁnancia...
2,22,Below are instructions for filling out items b...,[{'Active material to Conductive additive to B...,[{'Active material to Conductive additive to B...,2Scientific RepoRts | (2019) 9:8952...


In [24]:
# q3_eval_frame

In [25]:
# q4_eval_frame

#### 1. 임의로 설정한 sentence template에 value들을 넣어서 평가

In [26]:
def eval_sentence_generator(question_number, values):
    if question_number == 1:
        return f"The {values[0]} is {values[1]} commercially obtained, and synthesized with {values[2]}, and {values[3]}, and, {values[4]} for {values[5]} at {values[6]}, with {values[7]} doping and {values[8]} coating and {values[9]} treatment."
    elif question_number == 2:
        return f"The electrode is manufactured by {values[0]} with {values[1]} and {values[2]} is {values[3]} for {values[4]}."


In [27]:
q1_eval_frame["ground_truth_sentence"] = q1_eval_frame["ground_truth"].apply(lambda x: eval_sentence_generator(1, list(x[0].values())))
q1_eval_frame["answer_sentence"] = q1_eval_frame["answer"].apply(lambda x: eval_sentence_generator(1, list(x[0].values()))) 

In [28]:
q2_eval_frame["ground_truth_sentence"] = q2_eval_frame["ground_truth"].apply(lambda x: eval_sentence_generator(question_number=2, values=list(x[0].values())))
q2_eval_frame["answer_sentence"] = q2_eval_frame["answer"].apply(lambda x: eval_sentence_generator(question_number=2, values=list(x[0].values())))

### all metric

In [44]:
## 모든 metric data frame
q1_all_metric_df = q1_eval_frame[["ground_truth_sentence", "answer_sentence"]].apply(
    lambda row: evaluate_all_metrics(row["ground_truth_sentence"], row["answer_sentence"]), 
    axis=1
).apply(pd.Series)

## rouge 수정
q1_rouge_df = q1_all_metric_df["ROUGE"].apply(pd.Series)

## 최종 evaluation data frame
q1_eval_df = pd.concat([q1_eval_frame, q1_all_metric_df.drop(columns=["ROUGE"]), q1_rouge_df], axis=1)

In [46]:
q1_eval_df

Unnamed: 0,paper_number,question,ground_truth,answer,context,ground_truth_sentence,answer_sentence,BLEU,METEOR,Semantic Similarity (STS),rouge1,rouge2,rougeL
0,11,Below are instructions for filling out items b...,[{'Stoichiometry information': {'LiNi1/3Co1/3M...,[{'Stoichiometry information': {'NCM-111': {'L...,kinetics of the intercalation/deintercalation ...,"The {'LiNi1/3Co1/3Mn1/3O2': {'Li ratio': '1', ...","The {'NCM-111': {'Li ratio': '1', 'Ni ratio': ...",0.614218,0.68741,0.843172,0.703125,0.539683,0.640625
1,16,Below are instructions for filling out items b...,[{'Stoichiometry information': {'LiNi1/3Co1/3M...,[{'Stoichiometry information': {'NCM': {'Li ra...,was conducted on the electrochemical equipment...,"The {'LiNi1/3Co1/3Mn1/3O2': {'Li ratio': '1', ...","The {'NCM': {'Li ratio': '1.0', 'Ni ratio': '0...",0.46967,0.726418,0.852117,0.671329,0.524823,0.657343
2,22,Below are instructions for filling out items b...,[{'Stoichiometry information': {'LiNi0.84Co0.1...,[{'Stoichiometry information': {'NCM-622': {'L...,2Scientific RepoRts | (2019) 9:8952...,The {'LiNi0.84Co0.10Mn0.06O2': {'Li ratio': '1...,"The {'NCM-622': {'Li ratio': 'None', 'Ni ratio...",0.401726,0.698779,0.691044,0.629213,0.488636,0.58427


In [45]:
## 모든 metric data frame
q2_all_metric_df = q2_eval_frame[["ground_truth_sentence", "answer_sentence"]].apply(
    lambda row: evaluate_all_metrics(row["ground_truth_sentence"], row["answer_sentence"]), 
    axis=1
).apply(pd.Series)

## rouge 수정
q2_rouge_df = q2_all_metric_df["ROUGE"].apply(pd.Series)

## 최종 evaluation data frame
q2_eval_df = pd.concat([q2_eval_frame, q2_all_metric_df.drop(columns=["ROUGE"]), q2_rouge_df], axis=1)

In [47]:
q2_eval_df

Unnamed: 0,paper_number,question,ground_truth,answer,context,ground_truth_sentence,answer_sentence,BLEU,METEOR,Semantic Similarity (STS),rouge1,rouge2,rougeL
0,11,Below are instructions for filling out items b...,[{'Active material to Conductive additive to B...,[{'Active material to Conductive additive to B...,or to utilize conductive additives such as gra...,The electrode is manufactured by NCM : RGO : P...,The electrode is manufactured by None with {'S...,0.424187,0.589008,0.882969,0.683544,0.597403,0.683544
1,16,Below are instructions for filling out items b...,[{'Active material to Conductive additive to B...,[{'Active material to Conductive additive to B...,Notes\nThe authors declare no competingﬁnancia...,The electrode is manufactured by 80:15:5 with ...,The electrode is manufactured by 80 : 5 : 15 w...,0.511568,0.684932,0.949423,0.84507,0.637681,0.816901
2,22,Below are instructions for filling out items b...,[{'Active material to Conductive additive to B...,[{'Active material to Conductive additive to B...,2Scientific RepoRts | (2019) 9:8952...,The electrode is manufactured by 96: 2: 2 with...,The electrode is manufactured by 96 : 2 : 2 wi...,0.460498,0.613626,0.952416,0.781609,0.658824,0.781609


In [2]:
# from pprint import pprint
# print("#############################################################")
# pprint("question1 ground truth sentence")
# pprint(q1_eval_df["ground_truth_sentence"][0])
# print("#############################################################")
# pprint("question1 answer sentence")
# pprint(q1_eval_df["answer_sentence"][0])

In [299]:
# print("#############################################################")
# pprint("question2 ground truth sentence")
# pprint(q2_eval_df["ground_truth_sentence"][0])
# print("#############################################################")
# pprint("question2 answer sentence")
# pprint(q2_eval_df["answer_sentence"][0])
