# Llama-Index Knowledge Based 
[tutorial](https://betterprogramming.pub/how-to-build-your-own-custom-chatgpt-with-custom-knowledge-base-4e61ad82427e)


## Tree Index

In [14]:
import os
from dotenv import load_dotenv
import logging
import sys

load_dotenv()
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [24]:
from llama_index import GPTTreeIndex, SimpleDirectoryReader, StorageContext
from IPython.display import Markdown, display
storage_context = StorageContext.from_defaults()
persist_dir = "./tree"

documents = SimpleDirectoryReader('data').load_data()
new_index = GPTTreeIndex.from_documents(documents, storage_context=storage_context)
storage_context.persist(persist_dir)

INFO:llama_index.indices.common_tree.base:> Building index from nodes: 2 chunks
> Building index from nodes: 2 chunks
> Building index from nodes: 2 chunks
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 10701 tokens
> [build_index_from_nodes] Total LLM token usage: 10701 tokens
> [build_index_from_nodes] Total LLM token usage: 10701 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens


In [25]:
query_engine = new_index.as_query_engine()
response = query_engine.query("Are all sugar-free products calorie-free?")
display(Markdown(f"<b>{response}</b>"))

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')).
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')).
INFO:llama_index.indices.tree.select_leaf_retriever:>[Level 0] Selected node: [3]/[3]
>[Level 0] Selected node: [3]/[3]
>[Level 0] Selected node: [3]/[3]
INFO:llama_index.indices.tree.select_leaf_retriever:>[Level 1] Selected node: [7]/[7]
>[Level 1] Selected node: [7]/[7]
>[Level 1] Selected node: [7]/[7]
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 4333 tokens
> [retrieve] Total LLM token usage: 4333 tokens
> [retrieve] Total LLM token usage: 4333 tokens
INFO:llama_

<b>
No, not all sugar-free products are calorie-free. Sugar-free products may still contain calories from other sources such as fat, protein, and carbohydrates. It is important to read the nutrition label to check the calorie content of the product.</b>

In [27]:
response.response

'\nNo, not all sugar-free products are calorie-free. Sugar-free products may still contain calories from other sources such as fat, protein, and carbohydrates. It is important to read the nutrition label to check the calorie content of the product.'

## Save Index

In [20]:
from llama_index import GPTVectorStoreIndex
persist_dir = "./persist"
vector_store_index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)
storage_context.persist(persist_dir)

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 23443 tokens
> [build_index_from_nodes] Total embedding token usage: 23443 tokens
> [build_index_from_nodes] Total embedding token usage: 23443 tokens


In [22]:
query_engine = vector_store_index.as_query_engine()
response = query_engine.query("If I have diabetes, does that mean I can never consume starchy foods?")
display(Markdown(f"<b>{response}</b>"))

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 16 tokens
> [retrieve] Total embedding token usage: 16 tokens
> [retrieve] Total embedding token usage: 16 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1818 tokens
> [get_response] Total LLM token usage: 1818 tokens
> [get_response] Total LLM token usage: 1818 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens


<b>
No. Carbohydrate foods, particularly starchy foods such as rice, bread, noodles and cereals, form a major component of the body's energy source. All starchy foods</b>

## Vector Store Index

In [5]:
from llama_index import GPTVectorStoreIndex, StorageContext, SimpleDirectoryReader
import os
import logging
import sys
from dotenv import load_dotenv
from IPython.display import display, Markdown

load_dotenv()
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

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


In [9]:
from llama_index.retrievers import VectorIndexRetriever
from llama_index.indices.postprocessor import SimilarityPostprocessor
from llama_index.query_engine import RetrieverQueryEngine
from llama_index import ResponseSynthesizer, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./vector_store")

In [58]:
documents = SimpleDirectoryReader('data', exclude=['raw_data_combined.txt']).load_data()
docs_head = documents[:50]
docs_tail = documents[50:]
vector_store_index = GPTVectorStoreIndex.from_documents(docs_head, storage_context=storage_context)

In [10]:
vector_store_index = load_index_from_storage(storage_context=storage_context)

INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.
Loading all indices.
Loading all indices.
Loading all indices.
Loading all indices.


In [13]:
from llama_index.indices.query.query_transform.base import HyDEQueryTransform
from llama_index.query_engine.transform_query_engine import TransformQueryEngine
from llama_index.vector_stores.types import (
    VectorStoreQuery,
    VectorStoreQueryMode,
)

retriever = VectorIndexRetriever(
    index=vector_store_index,
    similarity_top_k=2,
)

response_synthesizer = ResponseSynthesizer.from_args(
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.85)
    ]
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer
)

response = query_engine.query('Do I get diabetes from sweets?')
display(Markdown(f"<b>{response}</b>"))

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 140 tokens
> [retrieve] Total embedding token usage: 140 tokens
> [retrieve] Total embedding token usage: 140 tokens
> [retrieve] Total embedding token usage: 140 tokens
> [retrieve] Total embedding token usage: 140 tokens
> [retrieve] Total embedding token usage: 140 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 251 tokens
> [get_response] Total LLM token usage: 251 tokens
> [get_response] Total LLM token usage: 251 tokens
> [get_response] Total LLM token usage: 251 tokens
> [get_response] Total LLM token usage: 251 tokens
> [get_response] Total

<b>
No, you do not get diabetes from eating sweets. Eating too much sugar can increase your risk of developing diabetes, but it is not the direct cause. Eating a balanced diet and exercising regularly can help reduce your risk of developing diabetes.</b>