# LLM reader 대체 실험

Ollama 설치가 필요합니다.

실험 환경: macOS, M2MAX, 32GB RAM
로컬 Ollama 서버로 llama2 활용하는 방법

1. Ollama 설치

    ```bash
    brew install ollama
    ```

2. Ollama 서버 실행

    ```bash
    ollama run llama2
    ```

3. Ollama 서버 중단(실험 종료 후)

    ```bash
    pkill ollama
    ```

## 모듈 설치

In [None]:
%pip install langchain-ollama=0.2.0
%pip install langchain=0.3.4
%pip install matplotlib=3.9.2
%pip install numpy=1.26.4
%pip install pandas=2.2.3
%pip install pyarrow=17.0.0
%pip install seaborn=0.13.2
%pip install sentencepiece=0.2.0
%pip install transformers=4.45.2

In [1]:
import pandas as pd
import pyarrow as pa
import numpy as np
import string
from collections import Counter
from langchain_ollama.llms import OllamaLLM
from langchain.prompts import PromptTemplate

## 데이터 로드

UPSTAGE AI 경연대회에서 제공한 데이터를 사용했습니다.

In [None]:
file_path = '../data/train_dataset/train/dataset.arrow'
with pa.memory_map(file_path, 'r') as source:
    table = pa.ipc.open_stream(source).read_all()
df: pd.DataFrame = table.to_pandas()
df.head()

## Llama2 로드 및 프롬프트 세팅

In [3]:
llm = OllamaLLM(model="llama2")

현재 코드는 반복 명령, 부정 명령을 사용한 프롬프트입니다. 이전에 사용했던 프롬프트는 다음과 같습니다.

1. Reflection, Chain of Thoughts 사용
    ```text
    Based on the following document, answer the question **in Korean only**. Do not answer in English.

    ### Document:
    {context}

    ### Question:
    {question}

    Provide the answer **in one word** or the **most important term** from the answer in Korean.

    ### Step-by-Step Reasoning (Tree of Thoughts):

    1. First, generate multiple possible answers in Korean based on the document, with an emphasis on **important keywords**.
    2. Review the generated answers and check if any of them contain non-Korean language or are too long. If so, refine the answers to ensure they are concise, in **Korean only**, and consist of the **most important term**.
    3. Choose the best possible answer, ensuring it satisfies the requirements (i.e., **Korean language only** and **a single important word or term**).
    4. If the selected answer doesn’t meet the requirements, reflect on the mistake, and repeat the process from step 1, recursively refining the output until the answer fits the conditions.

    ### Example 1:
    Document: 대통령을 포함한 미국의 행정부 견제권을 갖는 국가 기관은 하원이다.
    Question: 대통령을 포함한 미국의 행정부 견제권을 갖는 국가 기관은?
    Answer: 하원

    ### Example 2:
    Document: 현대적 인사조직관리의 시발점이 된 책은 《경영의 실제》이다.
    Question: 현대적 인사조직관리의 시발점이 된 책은?
    Answer: 《경영의 실제》

    ### Example 3:
    Document: 강희제가 1717년에 쓴 글은 백성을 위해 쓰여졌다.
    Question: 강희제가 1717년에 쓴 글은 누구를 위해 쓰여졌는가?
    Answer: 백성

    Now, answer the following question based on the document provided.
    ```

2. Chaining Variable 사용
    ```text
    ### Summary Prompt

    Please summarize the following document in Korean, focusing only on the most important details necessary to answer a question about it. The summary should be concise and should not include irrelevant information.

    ### Document:
    {context}

    Provide a summary of the document in **Korean** that captures the main ideas and important details.

    ### Answer Prompt

    Based on the summarized document below, answer the following question **in Korean only**. Do not answer in English. Make sure the answer is concise and directly addresses the question, using the most important term.

    ### Summarized Document:
    {summary}

    ### Question:
    {question}

    Provide the answer in one word or a single sentence in **Korean** that is most relevant to the question.

    ### Final Prompt

    Review the generated answer and check if it meets the following requirements:
    1. The answer is in **Korean only**.
    2. The answer is either a **single word** or a **concise sentence**.
    3. The answer directly addresses the question without adding unnecessary information.

    ### Answer:
    {answer}

    If the answer does not meet these criteria, refine it until it fits. If it meets the criteria, provide a final confirmation.
    ```

In [4]:
prompt_template = """
다음 문서를 분석하여 주어진 질문에 **오직 한국어로**, 가능한 한 짧고 간결한 **단일 구절**로 답변해 주세요. 문장은 사용하지 마세요.

### 문서:
{context}

### 질문:
{question}

답변은 단일 구절이어야 하며, 문장이 되어서는 안 됩니다. 오직 한국어로만 작성해 주세요.
"""

# 프롬프트 템플릿 생성
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

특수문자를 제거하고 EM, F1 score를 매기는 함수입니다.

In [10]:
def normalize_korean_answer(s):
    def remove_punctuation(text):
        exclude = set(string.punctuation + "·、，．？！＂＇〃《》「」『』〔〕“”‘’〈〉【】()[]{}")
        return "".join(ch for ch in text if ch not in exclude)

    def white_space_fix(text):
        return " ".join(text.split())

    def lower(text):
        return text.lower()

    return white_space_fix(remove_punctuation(lower(s)))

def exact_match(prediction, ground_truth):
    return normalize_korean_answer(prediction) == normalize_korean_answer(ground_truth)

# F1 Score calculation
def f1_score(prediction, ground_truth):
    prediction_tokens = normalize_korean_answer(prediction).split()
    ground_truth_tokens = normalize_korean_answer(ground_truth).split()
    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
    num_same = sum(common.values())

    if num_same == 0:
        return 0.0

    precision = num_same / len(prediction_tokens)
    recall = num_same / len(ground_truth_tokens)
    f1 = 2 * (precision * recall) / (precision + recall)
    return f1

## 재귀 함수를 통해 재 입력

In [None]:
def recursive_answer_generation(context, question, current_answer="", depth=0, max_depth=5):
    if depth >= max_depth:
        return current_answer  # 최대 재귀 깊이에 도달하면 종료

    try:
        # 새로운 답변 생성
        result = llm.generate(
            prompts=[prompt.format(context=context, question=question)]
        )
        temp_answer = result.generations[0][0].text.strip()

        # 새로 생성된 답변이 이전 답변과 같으면 재귀 종료
        if temp_answer == current_answer:
            return temp_answer
        
        # 새로 생성된 답변을 사용하여 다시 재귀 호출
        return recursive_answer_generation(temp_answer, question, current_answer=temp_answer, depth=depth + 1)
    
    except Exception as e:
        print(f"Error during recursion at depth {depth}: {e}")
        return current_answer  # 오류 발생 시 현재 답변 반환

exact_matches = []
f1_scores = []

for idx in range(len(df)):
    context = df.iloc[idx]['context']
    question = df.iloc[idx]['question']
    answer = df.iloc[idx]['answers']['text'][0]  # 정답 추출

    predicted_answer = recursive_answer_generation(context, question)        

    # 정확도 평가 (Exact Match와 F1 Score)
    em = exact_match(predicted_answer, answer)
    f1 = f1_score(predicted_answer, answer)

    # 메트릭 저장
    exact_matches.append(em)
    f1_scores.append(f1)
    
    # 출력
    print(f"{idx}.")
    print(f"Question: {question}")
    print(f"Predicted Answer: {predicted_answer}")
    print(f"True Answer: {answer}")
    print(f"Exact Match: {em}, F1 score: {f1}")
    print("-" * 40)

# 전체 평균 메트릭 계산
avg_em = np.mean(exact_matches)
avg_f1 = np.mean(f1_scores)

print(f"Average Exact Match: {avg_em:.4f}")
print(f"Average F1 Score: {avg_f1:.4f}")