# RAG with LLaMa
- local env: RAG

참고
- https://github.com/nicknochnack/Llama2RAG/tree/main
- https://agi-sphere.com/retrieval-augmented-generation-llama2/
- https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-13b-retrievalqa.ipynb#scrollTo=lhXARZQXq6QD

In [12]:
!pip install llama-index==0.7.21 llama_hub==0.0.19 openai==0.27.0 sentence-transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting llama-index==0.7.21
  Obtaining dependency information for llama-index==0.7.21 from https://files.pythonhosted.org/packages/49/2e/4748e16f5f00030e0047c2f8916bbff2ff7adfeb8b0accac5c4addb82700/llama_index-0.7.21-py3-none-any.whl.metadata
  Using cached llama_index-0.7.21-py3-none-any.whl.metadata (4.6 kB)
Collecting llama_hub==0.0.19
  Obtaining dependency information for llama_hub==0.0.19 from https://files.pythonhosted.org/packages/3e/2a/02995a7efe1d06fe9071c6317fcf33ea3e15dde02da3ba5469d3cea6414f/llama_hub-0.0.19-py3-none-any.whl.metadata
  Using cached llama_hub-0.0.19-py3-none-any.whl.metadata (8.8 kB)
Collecting dataclasses-json (from llama-index==0.7.21)
  Obtaining dependency information for

In [1]:
# Import transformer classes for generaiton
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
# Import torch for datatype attributes 
import torch
import json

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
with open("./secrets.json", "r") as file:
    secrets = json.load(file)

In [2]:
# Define variable to hold llama2 weights naming 
name = "daryl149/llama-2-7b-chat-hf"

# Set auth token variable from hugging face 
auth_token = secrets['HUGGINGFACE_AUTH_TOKEN']
pinecone_key = secrets['PINECONE_KEY']

In [3]:
# Create tokenizer
tokenizer = AutoTokenizer.from_pretrained(name, 
    cache_dir='./model/', use_auth_token=auth_token)



In [4]:
# Set torch device
device = "mps"

In [5]:
# Create model
model = AutoModelForCausalLM.from_pretrained(name, 
    cache_dir='./model/', use_auth_token=auth_token, torch_dtype=torch.float16, 
    rope_scaling={"type": "dynamic", "factor": 2}, load_in_8bit=False).to(device)

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:48<00:00, 24.42s/it]


In [6]:
# Setup a prompt 
prompt = "### User:What is the fastest car in  \
          the world and how much does it cost? \
          ### Assistant:"
# Pass the prompt to the tokenizer
inputs = tokenizer(prompt, return_tensors="pt").to(device)   # model.device
# Setup the text streamer 
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

In [7]:
# Actually run the thing
output = model.generate(**inputs, streamer=streamer, 
                        use_cache=True, max_new_tokens=float('inf'))

The fastest car in the world is the Bugatti Chiron Super Sport 300+, which has a top speed of 330 miles per hour (mph) and a price tag of around $3.







In [8]:
# Covert the output tokens back to text 
output_text = tokenizer.decode(output[0], skip_special_tokens=True)

In [9]:
# Import the prompt wrapper...but for llama index
from llama_index.prompts.prompts import SimpleInputPrompt

In [10]:
# Create a system prompt 
system_prompt = """[INST] <>
You are a helpful, respectful and honest assistant. Always answer as 
helpfully as possible, while being safe. Your answers should not include
any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain 
why instead of answering something not correct. If you don't know the answer 
to a question, please don't share false information.

Your goal is to provide answers relating to the financial performance of 
the company.<>
"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

In [11]:
# Complete the query prompt
query_wrapper_prompt.format(query_str='hello')

'hello [/INST]'

In [12]:
# Import the llama index HF Wrapper
from llama_index.llms import HuggingFaceLLM
# Create a HF LLM using the llama index wrapper 
llm = HuggingFaceLLM(context_window=4096,
                    max_new_tokens=256,
                    system_prompt=system_prompt,
                    query_wrapper_prompt=query_wrapper_prompt,
                    model=model,
                    tokenizer=tokenizer)

In [13]:
# Bring in embeddings wrapper
from llama_index.embeddings import LangchainEmbedding
# Bring in HF embeddings - need these to represent document chunks
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

In [14]:
# Create and dl embeddings instance  
embeddings=LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [15]:
# Bring in stuff to change service context
from llama_index import set_global_service_context
from llama_index import ServiceContext

In [16]:
# Create new service context instance
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embeddings
)
# And set the service context
set_global_service_context(service_context)

In [17]:
# Import deps to load documents 
from llama_index import VectorStoreIndex, download_loader
from pathlib import Path

# Download PDF Loader 
PyMuPDFReader = download_loader("PyMuPDFReader")
# Create PDF Loader
loader = PyMuPDFReader()
# Load documents 
documents = loader.load(file_path=Path('../data/퇴직연금/개인형 퇴직연금 IRP계좌 세액공제, 해지 세금, 수령방법 _ 네이버 블로그.pdf'), metadata=True)

In [18]:
# Create an index - we'll be able to query this in a sec
index = VectorStoreIndex.from_documents(documents)

In [19]:
# Setup index query engine using LLM 
query_engine = index.as_query_engine()

In [20]:
# Test out a query in natural
response = query_engine.query("근로자퇴직급여 보장법의 목적은?")

In [21]:
response

Response(response=" Based on the new context provided, the purpose of the labor pension insurance (근로자퇴직급여 보장법) in South Korea can be further refined as follows:\n1. Financial support: The insurance provides a stable source of income during retirement, helping employees to cover their living expenses and maintain their financial stability.\n2. Income replacement: The insurance aims to replace a portion of the employee's pre-retirement income, allowing them to maintain their living standards and enjoy their post-work years with financial security.\n3. Protection against poverty: By providing a stable source of income during retirement, the insurance helps to protect employees from poverty and financial insecurity during their post-work years.\nIn addition, the insurance also provides other benefits, such as:\n4. Tax benefits: The insurance can provide tax benefits to employees, such as reducing their tax burden and increasing their disposable income.\n5. Financial planning: The insuranc

In [23]:
print(response.response)

 Based on the new context provided, the purpose of the labor pension insurance (근로자퇴직급여 보장법) in South Korea can be further refined as follows:
1. Financial support: The insurance provides a stable source of income during retirement, helping employees to cover their living expenses and maintain their financial stability.
2. Income replacement: The insurance aims to replace a portion of the employee's pre-retirement income, allowing them to maintain their living standards and enjoy their post-work years with financial security.
3. Protection against poverty: By providing a stable source of income during retirement, the insurance helps to protect employees from poverty and financial insecurity during their post-work years.
In addition, the insurance also provides other benefits, such as:
4. Tax benefits: The insurance can provide tax benefits to employees, such as reducing their tax burden and increasing their disposable income.
5. Financial planning: The insurance can help employees plan