# RAG (Retrieval Augmented Generation) 검색 증강 생성

## 기존 생성형 언어 모델의 문제점
- 학습하지 않은 데이터에 대해서는 알지 못한다.
- Hallucination (거짓된 정보를 생성하는 현상. 모르는 정보를 모른다고 하지 않기 때문에 발생하는 현상)

### Without context
```
Q: Which university does John attend to and what is his major?
A: .... uhhh... Who is John???
```

### With context
```
Q: Which university does John attend to and what is his major?
Context: John was born in 2003 and he is currently a student in Sungkyunkwan University. He majors statistics.
A: John attends Sungkyunkwan University and majors statistics.
```

질문과 관련된 단서(context)를 제공하면 좀 더 정확한 답변을 할 수 있게 된다.

### RAG의 의미
- **R**etrieval: 질문과 관련된 단서를 검색한다.
- **A**ugmented: 검색된 단서로 정답을 보강한다.
- **G**eneration: 질문과 관련 단서를 바탕으로 답변을 생성한다.


# RAG의 단계
- Query: 질문을 한다
- Similarity search: 질문과 유사한 정보(또는 문서 document)를 여러 개 검색한다
- Reranking: 검색된 정보(또는 문서 document)를 다시 순위를 매겨서 가장 유사한 문서들의 순위를 다시 매긴다
- Generate: reranking이 완료된 문서와 질문을 바탕으로 답변을 생성한다.

# 필요 패키지 설치

In [1]:
!pip install -q langchain
!pip install -q langchain-community
!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes
!pip install -q datasets
!pip install -q sentence-transformers
!pip install -q faiss-gpu
!pip install -q ragatouille

In [2]:
import pandas as pd
import datasets
from langchain.docstore.document import Document as LangchainDocument
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from transformers import pipeline
import torch
from tqdm.notebook import tqdm

# Load Dataset

- 칵테일 레시피 데이터를 사용
- Chatbot은 사용자가 요구하는 칵테일의 제조 방법을 알려준다.
- 해당 데이터셋에서 유사한 문서들을 검색한다. (지식)

In [3]:
DATASET_NAME = "brianarbuckle/cocktail_recipes"
ds = datasets.load_dataset(DATASET_NAME)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


#### 데이터 설명

- title: 칵테일 이름
- ingredients: 재료
- directions: 칵테일 제조 방법
- misc: 부가 정보
- source: 정보의 출처
- ner: 칵테일의 분류

`title`, `ingredients`, `directions`, `ner` 4가지 정보만 사용할 예정이다.

In [4]:
ds['train']

Dataset({
    features: ['title', 'ingredients', 'directions', 'misc', 'source', 'ner'],
    num_rows: 875
})

In [5]:
ds["train"][0]

{'title': '151 Swizzle',
 'ingredients': ['1.5 oz. 151-Proof Demerara Rum [Lemon Hart or El Dorado]',
  '0.5 oz. Lime Juice',
  '0.5 oz. Sugar Syrup',
  '1 Dash Angostura Bitters',
  '6 drops [1\\/8 Tsp.] Pernod',
  '8 oz. crushed ice',
  'blend at high speed for 5 seconds, pour into metal cup or a pilsner glass',
  'Garnish with dust of nutmeg and a cinnamon stick'],
 'directions': [''],
 'misc': [],
 'source': 'Beachbum Berry Remixed',
 'ner': ['pernod', 'rum']}

제조 방법, 재료, ner 등의 정보가 없는 데이터는 제거한다.

In [6]:
# no directions
cnt = 0
for doc in ds['train']:
    if len(doc['directions']) == 1 and doc['directions'][0] == '':
        cnt += 1
print(cnt)

56


In [7]:
# no ingredients
cnt = 0
for doc in ds['train']:
    if len(doc['ingredients']) == 1 and doc['ingredients'][0] == '':
        cnt += 1
print(cnt)

2


In [8]:
# no ner
cnt = 0
for doc in ds['train']:
    if len(doc['ner']) == 0:
        cnt += 1
print(cnt)

122


지식을 저장한다

In [9]:
knowledge_base = []  # list to store cocktail knowledge

for doc in tqdm(ds['train']):
    # skip doc with missing information
    if len(doc['directions']) == 1 and doc['directions'][0] == '':
        continue
    if len(doc['ingredients']) == 1 and doc['ingredients'][0] == '':
        continue
    if len(doc['ner']) == 0:
        continue

    directions = " ".join(doc['directions'])

    info_str = f"Name: {doc['title']}, ingredients: {doc['ingredients']}, directions: {directions}"

    knowledge_base.append(
        LangchainDocument(page_content=info_str, metadata={"ner": doc['ner']})
    )

  0%|          | 0/875 [00:00<?, ?it/s]

# Similarity Search

유사도 검색을 위해서 질문과 저장된 문서들 간의 수학적 연산을 적용해야 한다. 이 때 자주 사용되는 것이 embedding이라는 것이다.

## Embedding이란?

하나의 문장을 숫자의 묶음 (벡터라고 한다)으로 표현해 놓은 것이다. 각 문장 별로 숫자들을 활용해 다른 문장의 숫자 묶음 (벡터)와 수학 연산(코사인 유사도 계산)을 통해 하나의 문장이 다른 문장과 얼마나 유사한지 수치로 나타낼 수 있다.

In [10]:
EMBEDDING_MODEL_NAME = "thenlper/gte-small"

tokenizer = AutoTokenizer.from_pretrained(EMBEDDING_MODEL_NAME)

In [11]:
lengths = [len(doc.page_content) for doc in tqdm(knowledge_base)]
print("Length of the longest document:", max(lengths))

lengths = [len(tokenizer.encode(doc.page_content)) for doc in tqdm(knowledge_base)]
print("Max number of tokens:", max(lengths))

  0%|          | 0/721 [00:00<?, ?it/s]

Length of the longest document: 2087


  0%|          | 0/721 [00:00<?, ?it/s]

Max number of tokens: 542


In [12]:
embedding_model = HuggingFaceEmbeddings(
    model_name=EMBEDDING_MODEL_NAME,
    multi_process=True,
    model_kwargs={"device": "cuda"},
    encode_kwargs={"normalize_embeddings": True}
)

knowlegde_vector_db = FAISS.from_documents(
    knowledge_base,  # cocktail knowledge
    embedding_model,
    distance_strategy=DistanceStrategy.COSINE  # use cosine similarity for similarity search
)

  warn_deprecated(


유사도 검색 해보기

In [13]:
user_query = "How do you make Tonic Gins?"
query_vector = embedding_model.embed_query(user_query)

retrieved_docs = knowlegde_vector_db.similarity_search(query=user_query, k=5)

for retrieved_doc in retrieved_docs:
    print(retrieved_docs)

[Document(metadata={'ner': ['gin']}, page_content="Name: Imperial Gin, ingredients: ['2 ounces gin', '1 ounce fresh lemon juice', '1 ounce fresh lime juice', '2 to 3 ounces chilled club soda', 'Lemon peel spiral'], directions: Shake gin, lemon juice, and lime juice) with ice. Strain into an ice-filled highball glass. Top with club soda and stir gently. Garnish with lemon peel spiral."), Document(metadata={'ner': ['vodka', 'gin']}, page_content="Name: Vodka Tonic, ingredients: ['Those who prefer a Vodka Martini over a Gin Martini will also undoubtedly enjoy this over the classic Gin and Tonic, with a few extra squeezes of lime suggested for additional refreshing lime flavor.', '1 to 3 lime wedges', '2 ounces vodka', '3 to 5 ounces chilled tonic water'], directions: Rim a chilled highball glass with a lime wedge, and drop it in. Fill glass with ice, pour in the vodka, and top with tonic water. Squeeze the remaining lime wedges over the drink, and drop them in."), Document(metadata={'ner'

# Load Generative Model for Chatting

최종 답변을 생성할 모델을 불러오는 과정이다. 모델의 용량이 크기 때문에 오래 걸릴 수 있다.

In [14]:
CHAT_MODEL_NAME = "HuggingFaceH4/zephyr-7b-beta"

# quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(CHAT_MODEL_NAME, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(CHAT_MODEL_NAME)

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

모델이 잘 생성하는지 확인

In [15]:
chat_llm = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    do_sample=True,
    temperature=0.1,
    return_full_text=False,
    max_new_tokens=500,
    repetition_penalty=1.1
)

In [16]:
chat_llm(user_query)

[{'generated_text': '\n\nWe use a traditional copper still, which is heated by steam. The botanicals are steeped in neutral grain spirit for 24 hours before being distilled. This allows the flavours to infuse into the spirit and creates a more complex gin. We then add water from our own borehole, which is naturally filtered through chalk and flint, giving it a soft, silky texture.\n\nWhat makes your gins unique compared to other brands?\n\nOur focus on using only the finest botanicals and our commitment to sustainability sets us apart from other brands. We source our botanicals from local suppliers and grow some of them ourselves. Our packaging is also made from recycled materials and we aim to be as environmentally friendly as possible in all aspects of our business. Additionally, our use of a traditional copper still and natural spring water gives our gins a unique flavour profile that is distinctly English.\n\nCan you provide any information about the history of Tonic Gins and how t

모델에게 질문할 template  작성

In [17]:
prompt_in_chat_format = [
    {
        "role": "system",
        "content": """Using the information contained in the context,
give a comprehensive answer to the question.
Respond only to the question asked, response should be concise and relevant to the question.
Provide the number of the source document when relevant.
If the answer cannot be deduced from the context, do not give an answer.""",
    },
    {
        "role": "user",
        "content": """Context:
{context}
---
Now here is the question you need to answer.

Question: {question}""",
    },
]

RAG_PROMPT_TEMPLATE = tokenizer.apply_chat_template(
    prompt_in_chat_format, tokenize=False, add_generation_prompt=True
)

RAG를 활용한 답변 생성

In [18]:
user_query = "How do you make Tonic Gins?"  # question
query_vector = embedding_model.embed_query(user_query)  # change to embedding

# similarity search
retrieved_docs = knowlegde_vector_db.similarity_search(query=user_query, k=5)
retrieved_docs_text = [doc.page_content for doc in retrieved_docs]

context = '\nExtracted documents:\n'
context += "".join([f"Document {str(i)}:::\n" + doc for i, doc in enumerate(retrieved_docs_text)])

final_prompt = RAG_PROMPT_TEMPLATE.format(context=context, question=user_query) # add context to prompt

answer = chat_llm(final_prompt)[0]["generated_text"]  # ask the model for final answer
print(answer)

Based on the provided context and extracted documents, there is no specific recipe or instructions for making Tonic Gins. However, Document 1 mentions using vodka instead of gin in a Tonic variation, suggesting that Tonic Gins could potentially be made by substituting gin for vodka in a traditional Tonic recipe. Here's how you might make a basic Tonic Gin based on this information:

Ingredients:
- 2 oz gin
- 3-5 oz tonic water
- Lime wedge (optional)

Directions:
1. Fill a highball glass with ice.
2. Pour gin over the ice.
3. Add tonic water and stir gently.
4. Squeeze a lime wedge over the drink (optional).
5. Drop the lime wedge into the glass (optional).
6. Enjoy your Tonic Gin!

Note: The exact measurements for the gin and tonic water can be adjusted to personal preference. Some people prefer a stronger gin taste, while others prefer a more subtle one. It's all about finding the right balance for you.


## Reranking 적용

일반적으로 가져온 정보의 정확도를 높이기 위해서 1차적으로 많이 가져온 다음, 별도의 모델을 사용해서 가져온 데이터에 다시 순위를 매기는 과정을 거친다.

In [19]:
from ragatouille import RAGPretrainedModel

reranker = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")



In [20]:
user_query = "How do you make Tonic Gins?"  # query
query_vector = embedding_model.embed_query(user_query)  # change to embedding

retrieved_docs = knowlegde_vector_db.similarity_search(query=user_query, k=10)  # search 10 documents initially
retrieved_docs_text = [doc.page_content for doc in retrieved_docs]

relevant_docs = reranker.rerank(user_query, retrieved_docs_text, k=3)  # rerank the documents and pick 3
relevant_docs = [doc["content"] for doc in relevant_docs]

relevant_docs = relevant_docs[:3]

context = '\nExtracted documents:\n'
context += "".join([f"Document {str(i)}:::\n" + doc for i, doc in enumerate(relevant_docs)])

final_prompt = RAG_PROMPT_TEMPLATE.format(context=context, question=user_query)  # add context to template

answer = chat_llm(final_prompt)[0]["generated_text"]  # generate final answer
print(answer)

100%|██████████| 1/1 [00:00<00:00, 30.12it/s]


Based on the provided context and extracted documents, there are two recipes for making Tonic Gins:

1. Vodka Tonic (Document 0):
   - Ingredients: Vodka, tonic water, lime wedges
   - Directions: Rim a chilled highball glass with a lime wedge, fill with ice, pour in 2 oz of vodka, and top with 3-5 oz of chilled tonic water. Squeeze remaining lime wedges over the drink and drop them in.

2. Imperial Gin (Document 2):
   - Ingredients: Gin, fresh lemon juice, fresh lime juice, chilled club soda, lemon peel spiral
   - Directions: Shake 2 oz of gin, 1 oz of fresh lemon juice, and 1 oz of fresh lime juice with ice. Strain into an ice-filled highball glass and top with 2-3 oz of chilled club soda. Garnish with a lemon peel spiral.

Note that both recipes use tonic water as one of their main ingredients, which gives them a distinct Tonic flavor. However, in Document 0, the recipe suggests adding more lime juice for an extra refreshing taste, while in Document 2, the recipe includes fresh le