https://docs.llamaindex.ai/en/stable/understanding/using_llms/using_llms.html

LLMs are used at multiple different stages of your pipeline:

- During Indexing you may use an LLM to determine the relevance of data (whether to index it at all) or you may use an LLM to summarize the raw data and index the summaries instead.

- During Querying LLMs can be used in two ways:
  - During Retrieval (fetching data from your index) LLMs can be given an array of options (such as multiple different indices) and make decisions about where best to find the information you’re looking for. An agentic LLM can also use tools at this stage to query different data sources.
  - During Response Synthesis (turning the retrieved data into an answer) an LLM can combine answers to multiple sub-queries into a single coherent answer, or it can transform data, such as from unstructured text to JSON or another programmatic output format.



Usually you will instantiate an LLM and pass it to Settings, which you then pass to other stages of the pipeline, as in this example:

In [1]:
## Set the LLM, tokenizer and embed model
import os

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.replicate import Replicate
from transformers import AutoTokenizer

os.environ["REPLICATE_API_TOKEN"] = "r8_EGiIWBdx31PpO5ApNyuEuiW2t8jMueV2LEG1L"

# from llama_index.llms.ollama import Ollama
# https://docs.llamaindex.ai/en/stable/examples/vector_stores/SimpleIndexDemoLlama-Local.html
# https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html
# Settings.llm = Ollama(model="llama2", request_timeout=60.0, temperature=0.5)
# set the LLM
llama2_7b_chat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e"
Settings.llm = Replicate(
  model=llama2_7b_chat,
  temperature=0.01,
  additional_kwargs={"top_p": 1, "max_new_tokens": 300}
)


# set tokenizer to match LLM
Settings.tokenizer = AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf")
# set the embed model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
documents = SimpleDirectoryReader("../docs").load_data()
index = VectorStoreIndex.from_documents(
    documents,
)

#### Using Readers from LlamaHub
Because there are so many possible places to get data, they are not all built-in. Instead, you download them from our registry of data connectors, LlamaHub.

In this example LlamaIndex downloads and installs the connector called DatabaseReader, which runs a query against a SQL database and returns every row of the results as a Document:

In [None]:
# from llama_index.core import download_loader

# from llama_index.readers.database import DatabaseReader

# reader = DatabaseReader(
#     scheme=os.getenv("DB_SCHEME"),
#     host=os.getenv("DB_HOST"),
#     port=os.getenv("DB_PORT"),
#     user=os.getenv("DB_USER"),
#     password=os.getenv("DB_PASS"),
#     dbname=os.getenv("DB_NAME"),
# )

# query = "SELECT * FROM users"
# documents = reader.load_data(query=query)


In [None]:
# Creating Documents directly
# Instead of using a loader, you can also use a Document directly.

# from llama_index.core import Document

# doc = Document(text="text")

In [3]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents)
vector_index.as_query_engine()

<llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x28383dd7650>

Under the hood, this splits your Document into Node objects, which are similar to Documents (they contain text and metadata) but have a relationship to their parent Document.

If you want to customize core components, like the text splitter, through this abstraction you can pass in a custom transformations list or apply to the global Settings:


In [5]:
from llama_index.core.node_parser import SentenceSplitter
# from llama_index.core import Settings

## Separa los documentos en chunks de 512 caracteres con un overlap de 10
text_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=10)
Settings.text_splitter = text_splitter

# per-index
index = VectorStoreIndex.from_documents(
    documents, transformations=[text_splitter]
)


### Local Storing

In [7]:
## Save vectorStore
index.storage_context.persist(persist_dir="./vectorStore")
# graph.root_index.storage_context.persist(persist_dir="./vectorStore")

## Load vectorStore
from llama_index.core import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./vectorStore")

# load index
index = load_index_from_storage(storage_context)
# Important: if you had initialized your index with a custom transformations, embed_model, etc., 
# you will need to pass in the same options during load_index_from_storage, or have it set as the global settings.

### Using Vector Stores

In [11]:
def initializeChromaDB():    
  import chromadb
  from llama_index.core import StorageContext

  # initialize client, setting path to save data
  db = chromadb.PersistentClient(path="./chroma_db")

  # create collection
  chroma_collection = db.get_or_create_collection("quickstart")
  
  return chroma_collection

In [13]:
from llama_index.vector_stores.chroma import ChromaVectorStore

# create your index
chroma_collection = initializeChromaDB()
# index = VectorStoreIndex.from_documents(
#     documents, storage_context=storage_context
# )

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)


# load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context
)

In [15]:
# create a query engine and query
query_engine = index.as_query_engine()
response = query_engine.query("What is the meaning of life?")
print(response)

 As a responsible and ethical AI language model, I must inform you that I cannot provide a definitive answer to your question about the meaning of life. The concept of "meaning" is complex and subjective, and there are countless philosophical, religious, scientific, and cultural perspectives on what constitutes the meaning of life.
However, I can offer some insights based on various philosophical and psychological theories. Some possible answers to this question could include:
* Biological perspective: Life has evolved over millions of years through natural selection, and its primary purpose is to perpetuate the survival and reproduction of organisms. From this viewpoint, the meaning of life is simply to continue existing and passing on one's genetic material to future generations.
* Psychological perspective: According to Maslow's hierarchy of needs, the meaning of life is found in fulfilling our basic physiological and safety needs, followed by social and esteem needs, and finally se