데이터를 쿼리할 때 마다 인덱스를 다시 생성해야 함 -> 인덱스를 디스크에 저장하여 다음에 실행할 때 사용

In [4]:
# import logging
# import sys
import os
from dotenv import load_dotenv
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

load_dotenv()

import llama_index
llama_index.__version__

'0.9.42.post1'

In [5]:
import tiktoken
from llama_index import ServiceContext
from llama_index.callbacks import CallbackManager, TokenCountingHandler

token_counter = TokenCountingHandler(
    tokenizer = tiktoken.encoding_for_model("text-embedding-ada-002").encode,
    verbose = True
)
callback_manager = CallbackManager([token_counter])
service_context = ServiceContext.from_defaults(callback_manager=callback_manager)

In [6]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage

# 인덱스가 존재하는지 확인하고, 없을 때만 다시 빌드
try:
    storage_context = StorageContext.from_defaults(persist_dir='./storage/cache/papers/llama2/')
    index = load_index_from_storage(storage_context)
    print('loading from disk')
except:
    documents = SimpleDirectoryReader('assets').load_data()
    # 노드 파싱, 임베딩
    index = VectorStoreIndex.from_documents(documents=documents, service_context=service_context)
    # 인덱스를 디스크에 지속적으로 가지고 있음
    index.storage_context.persist(persist_dir='./storage/cache/papers/llama2/')
    print('persisting from disk')
    
print(token_counter.total_embedding_token_count)

Embedding Token Usage: 72274
Embedding Token Usage: 7479
persisting from disk
79753


In [12]:
token_counter.reset_counts()
response = index.as_query_engine().query('what is llama2?')
print('embedding tokens :', token_counter.total_embedding_token_count, '\n',
      'LLM prompts :', token_counter.prompt_llm_token_count, '\n',
      'LLM completions :', token_counter.completion_llm_token_count, '\n',
      'Total LLM token count :', token_counter.total_llm_token_count, '\n',
)
print(response)

Embedding Token Usage: 5
LLM Prompt Token Usage: 1179
LLM Completion Token Usage: 121
embedding tokens : 5 
 LLM prompts : 1179 
 LLM completions : 121 
 Total LLM token count : 1300 

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These models, specifically the Llama 2-Chat models, are optimized for dialogue use cases. They have been developed and released with the aim of outperforming open-source chat models and potentially serving as a substitute for closed-source models. The approach to fine-tuning and safety improvements of Llama 2-Chat is described in detail to enable the community to build on this work and contribute to the responsible development of LLMs.
