데이터를 쿼리할 때 마다 인덱스를 다시 생성해야 함 -> 인덱스를 디스크에 저장하여 다음에 실행할 때 사용

In [1]:
import logging
import sys
import os
from dotenv import load_dotenv
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

load_dotenv()

import llama_index
llama_index.__version__

'0.9.42.post1'

In [2]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage

# 인덱스가 존재하는지 확인하고, 없을 때만 다시 빌드
try:
    storage_context = StorageContext.from_defaults(persist_dir='./storage/cache/papers/llama2/')
    index = load_index_from_storage(storage_context)
    print('loading from disk')
except:
    documents = SimpleDirectoryReader('assets').load_data()
    # 노드 파싱, 임베딩
    index = VectorStoreIndex.from_documents(documents=documents)
    # 인덱스를 디스크에 지속적으로 가지고 있음
    index.storage_context.persist(persist_dir='./storage/cache/papers/llama2/')
    print('persisting from disk')

DEBUG:llama_index.storage.kvstore.simple_kvstore:Loading llama_index.storage.kvstore.simple_kvstore from ./storage/cache/papers/llama2/docstore.json.
Loading llama_index.storage.kvstore.simple_kvstore from ./storage/cache/papers/llama2/docstore.json.
DEBUG:fsspec.local:open file: /Users/jeonjunhwi/문서/Projects/LLM_Study/llamaindex_langchain_compare/storage/cache/papers/llama2/docstore.json
open file: /Users/jeonjunhwi/문서/Projects/LLM_Study/llamaindex_langchain_compare/storage/cache/papers/llama2/docstore.json
DEBUG:llama_index.storage.kvstore.simple_kvstore:Loading llama_index.storage.kvstore.simple_kvstore from ./storage/cache/papers/llama2/index_store.json.
Loading llama_index.storage.kvstore.simple_kvstore from ./storage/cache/papers/llama2/index_store.json.
DEBUG:fsspec.local:open file: /Users/jeonjunhwi/문서/Projects/LLM_Study/llamaindex_langchain_compare/storage/cache/papers/llama2/index_store.json
open file: /Users/jeonjunhwi/문서/Projects/LLM_Study/llamaindex_langchain_c

In [3]:
# debug로 설정하면 OpenAI에 어떤 프롬프트가 보내졌는지 볼 수 있음
import openai
openai.log = 'debug'

In [6]:
from llama_index.prompts import PromptTemplate

text_qa_template_str = (
    # default template
    "Context information is below.\n"
    "---------------------------\n"
    "{context_str}\n"
    "---------------------------\n"
    "Using both the context information and also using your own knowledge, "
    "answer the question : {query_str}\n"
    "If the context isn't helpful, you can also answer the question on your own.\n"   
)
text_qa_template = PromptTemplate(text_qa_template_str)

In [9]:
from llama_index.prompts import PromptTemplate

add_system_qa_template_str = (
    # system message 추가
    "You are author of llama2"
    "Always answer the query only using the provided context information,"
    "and not prior knowledge.\n"
    "Some rules to follow:\n"
    "1. Never directly reference the given context in your answer.\n"
    "2. Avoid statements like 'Based on the context, ...' or"
    "'The context information ...' or anything along "
    "those lines."
    
    "Context information is below.\n"
    "---------------------------\n"
    "{context_str}\n"
    "---------------------------\n"
    "Answer the question : {query_str}\n"
)
add_system_qa_template = PromptTemplate(text_qa_template_str)

In [13]:
response_1 = index.as_query_engine(
    text_qa_template=text_qa_template
    ).query('What is llama2?')

response_2 = index.as_query_engine(
    text_qa_template=add_system_qa_template
    ).query('What is llama2?')

DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x150749ab0>, 'json_data': {'input': ['What is llama2?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}}
Request options: {'method': 'post', 'url': '/embeddings', 'files': None, 'post_parser': <function Embeddings.create.<locals>.parser at 0x150749ab0>, 'json_data': {'input': ['What is llama2?'], 'model': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>, 'encoding_format': 'base64'}}
DEBUG:httpcore.connection:close.started
close.started
DEBUG:httpcore.connection:close.complete
close.complete
DEBUG:httpcore.connection:connect_tcp.started host='api.openai.com' port=443 local_address=None timeout=60.0 socket_options=None
connect_tcp.started host='api.openai.com' port=443 local_address=None timeout=60.0 socket_options=None
DEBUG:httpcore.connect

In [14]:
print(response_1)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed by the authors. These models range in scale from 7 billion to 70 billion parameters. The authors specifically mention Llama 2-Chat, which is optimized for dialogue use cases. The models have demonstrated competitiveness with existing open-source chat models and are considered a suitable substitute for closed-source models based on evaluations for helpfulness and safety. The authors provide a detailed description of their approach to fine-tuning and safety improvements in order to enable the community to build on their work and contribute to the responsible development of LLMs.


In [15]:
print(response_2)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed and released by the authors of the paper. These models range in scale from 7 billion to 70 billion parameters. The authors specifically mention Llama 2-Chat, which is optimized for dialogue use cases. The models have demonstrated competitiveness with existing open-source chat models and are considered a suitable substitute for closed-source models based on evaluations for helpfulness and safety. The authors provide a detailed description of their approach to fine-tuning and safety improvements in order to enable the community to build on their work and contribute to the responsible development of LLMs.


In [3]:
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.postprocessor import SimilarityPostprocessor, KeywordNodePostprocessor
from llama_index.response_synthesizers import get_response_synthesizer


query_engine = index.as_query_engine()
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=4,
)
s_processor = SimilarityPostprocessor(similarity_cutoff=0.5) # 스코어 컷오프 지정
k_processor = KeywordNodePostprocessor(
    exclude_keywords=['cummecial'],
    # required_keywords=['llama2']
)
# 디버깅을 위해서 llm으로 보내지 않음(토큰 사용 x) 쿼리 엔진을 실행하면 final answer가 나오지 않음.
response_synthesize = get_response_synthesizer(
    response_mode='no_text' 
)
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[k_processor, s_processor],
    # response_synthesizer=response_synthesize
)

In [4]:
from llama_index.response.pprint_utils import pprint_response

response = query_engine.query('what is llama2?')
pprint_response(response, show_source=True) # 보여주기만 하는 기능
print(response)

Final Response: Llama 2 is a collection of pretrained and fine-tuned
large language models (LLMs) ranging in scale from 7 billion to 70
billion parameters. These models, called Llama 2-Chat, are optimized
for dialogue use cases and have demonstrated competitiveness with
existing open-source chat models. They are considered to be a suitable
substitute for closed-source models in terms of helpfulness and
safety. Llama 2 is made available for both research and commercial
use, and developers must comply with the terms of the provided license
and the Acceptable Use Policy. The responsible release of Llama 2 aims
to encourage responsible AI innovation and collaboration within the AI
community.
______________________________________________________________________
Source Node 1/4
Node ID: b8d70caa-410d-486b-bcfc-bd93e23062e7
Similarity: 0.8378943080728823
Text: (2021)alsoilluminatesthedifficultiestiedtochatbot-oriented LLMs,
with concerns ranging from privacy to misleading expertise claims.
D

In [31]:
# 그냥 응답의 모든 정보를 보려면
print(response.source_nodes)

[NodeWithScore(node=TextNode(id_='ff89be49-96f5-4ad9-b2a4-0f07fed644cd', embedding=None, metadata={'page_label': '36', 'file_name': 'llama2.pdf', 'file_path': 'assets/llama2.pdf', 'file_type': 'application/pdf', 'file_size': 13661300, 'creation_date': '2024-02-04', 'last_modified_date': '2023-12-16', 'last_accessed_date': '2024-02-04'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='6fa2b14b-261f-43bb-8201-1192e3dd2c8b', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '36', 'file_name': 'llama2.pdf', 'file_path': 'assets/llama2.pdf', 'file_type': 'application/pdf', 'file_size': 13661300, 'creation_date': '2024-02-04', 'last_modified_date': '2023-12-16', 'last_accessed_date': '2024-02-04'}, ha