# llama-cpp-python



The Python package provides simple bindings for the llama.cpp library, offering access to the C API via ctypes interface, a high-level Python API for text completion, OpenAI-like API, and LangChain compatibility. It supports multiple BLAS backends for faster processing and includes both high-level and low-level APIs, along with web server functionality.

`llama.cpp`'s objective is to run the LLaMA model with 4-bit integer quantization on MacBook. It is a plain C/C++ implementation optimized for Apple silicon and x86 architectures, supporting various integer quantization and BLAS libraries. Originally a web chat example, it now serves as a development playground for ggml library features.

#  Load Quantized Models from HuggingFace

The Hugging Face community provides quantized models, which allow us to efficiently and effectively utilize the model on T4 and consumer grade GPUs. It is important to consult reliable sources before using any model.

In [None]:
# For downloading the models
%pip install huggingface_hub

In [1]:
model_cache_dir = "./models_cache"
model_name_or_path = "TheBloke/Llama-2-7B-Chat-GGUF"
model_filename = "llama-2-7b-chat.Q5_K_S.gguf"

Set the model path if one is already downloaded

In [2]:
model_path = "models_cache\\models--TheBloke--Llama-2-7B-Chat-GGUF\\snapshots\\191239b3e26b2882fb562ffccdd1cf0f65402adb\\llama-2-7b-chat.Q5_K_S.gguf"

Or download a model from HuggingFace Hub

In [4]:
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_filename,
    cache_dir=model_cache_dir,
)

print(f"downloaded model_path::\n{model_path}")

llama-2-7b-chat.Q5_K_S.gguf:   0%|          | 0.00/4.65G [00:00<?, ?B/s]

downloaded model_path::
./models_cache\models--TheBloke--Llama-2-7B-Chat-GGUF\snapshots\191239b3e26b2882fb562ffccdd1cf0f65402adb\llama-2-7b-chat.Q5_K_S.gguf


# Augmented Retrieval with Llama-Index & ChromaDB
https://docs.llamaindex.ai/en/stable/understanding/storing/storing/



Install llama-index. This will install llama-cpp-python for CPU by default.

From version 0.10.0 and later, you import most things from llama-index-core. For more specific modules you can import like llama-index-[package]-[sub-package]-[target].

In [None]:
%pip install llama-index
%pip install llama-index-vector-stores-chroma
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-llama-cpp

Install inference for GPU with Nvidia CUBLAS support enabled (recommended, faster).

The command below will automatically install the required Nvidia GPU drivers.

In [None]:
%pip install torch==2.2.2

Build support for GPU. llama-index installs its own version, so we need to re-install afterwards for GPU support.

In [None]:
%%cmd
set CMAKE_ARGS=-DLLAMA_CUBLAS=on && set FORCE_CMAKE=1 && pip install llama-cpp-python==0.2.32 --force-reinstall --upgrade --no-cache-dir --verbose

Install ChromaDB.

In [None]:
%pip install chromadb

Setup a local embedding model.

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5",
    cache_folder=model_cache_dir,
)

Setup a local LLM to do the inference.

In [4]:
import torch
from llama_index.llms.llama_cpp import LlamaCPP

# Create inference client
lcpp_llm = LlamaCPP(
    context_window=4096,
    max_new_tokens=2048,
    model_path=model_path,
    verbose=True,
    generate_kwargs={
      "temperature": 0.0,
      "top_p": 0.95,
      "repeat_penalty": 1.2,
      "top_k": 50,
    },
    # change these settings below depending on your GPU
    model_kwargs={
        "torch_dtype": torch.float16,
        "load_in_8bit": True,
        "n_gpu_layers": 43,
        "n_threads": 2,
        "n_batch": 512,
    },
)

print(f"\nmodel config::\n{lcpp_llm.Config}")


model config::
<class 'llama_index.core.base.llms.base.BaseLLM.Config'>


AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | 
Model metadata: {'general.name': 'LLaMA v2', 'general.architecture': 'llama', 'llama.context_length': '4096', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '11008', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '16', 'llama.attention.head_count_kv': '32', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0'}


In [7]:
# Setup paths/names
vector_store_path = "chromadb"
collection_name = "quickstart"

# Source document id
import uuid
source_unique_id = str(uuid.uuid4()).replace('-', '')
source_file_path = "./data/org-chart.md"#"./data/device-list.md"
print(f"source id::{source_unique_id}")

source id::440c7a39ab424306ba1215e81d0f4dbe


Create documents using node parsers and text-splitters.

In [11]:
from llama_index.core.schema import IndexNode
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import SimpleDirectoryReader
# from llama_index.readers.file import MarkdownReader
from llama_index.core import Document

# Parse and chunk markdown files using custom function
chunk_size = 250
chunk_overlap = 0

# read source files
sources = [source_file_path]
documents = []
for ind, s in enumerate(sources):
    source_unique_id = str(uuid.uuid4()).replace('-', '')
    loaded_docs = SimpleDirectoryReader(input_files=[s]).load_data()
    loaded_doc = loaded_docs[ind]
    doc_text = [d.get_content() for d in loaded_docs]
    # create metadata
    chunk_metadata = {
        **loaded_doc.metadata,
        "sourceId": source_unique_id, # may not need if chunks know our id_ from their index_id
    }
    # create source document
    source_doc = Document(
        id_=source_unique_id,
        text="".join(doc_text),
        metadata=chunk_metadata,
        # @TODO `relationships` object is empty?
    )
    # Tell query engine to ignore this metadata key
    source_doc.excluded_llm_metadata_keys.append("sourceId")
    source_doc.excluded_embed_metadata_keys.append("sourceId")
    # return source `Document`
    documents.append(source_doc)

print(f"\nCreated {len(documents)} docs\n{documents}")

# split source documents into chunk_nodes
chunk_nodes = []

text_splitter = SentenceSplitter(
    # Split along major headings (h2) then by whole sentences
    paragraph_separator="\n## ",
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
)

for idx, doc in enumerate(documents):
    curr_nodes = text_splitter.get_nodes_from_documents(
        documents=documents,
        show_progress=True,
    )
    for curr_node in curr_nodes:
        # ID will be base + parent
        chunk_node = IndexNode(
            text=curr_node.text or "None",
            index_id=str(doc.id_),
            metadata=doc.metadata
        )
        # Tell query engine to ignore this metadata key
        chunk_node.excluded_llm_metadata_keys = doc.excluded_llm_metadata_keys
        chunk_node.excluded_embed_metadata_keys = doc.excluded_embed_metadata_keys
        # return chunk `IndexNode`
        chunk_nodes.append(chunk_node)

print(f"Added {len(chunk_nodes)} chunks to collection::\n{chunk_nodes}")


Created 1 docs
[Document(id_='a578e8eb95df4129b3280558daf79174', embedding=None, metadata={'file_path': 'data\\org-chart.md', 'file_name': 'org-chart.md', 'file_size': 2943, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-16', 'sourceId': 'a578e8eb95df4129b3280558daf79174'}, excluded_embed_metadata_keys=['sourceId'], excluded_llm_metadata_keys=['sourceId'], relationships={}, text='\n\nXYZ Software Solutions Organization Chart\n\r\nWelcome to XYZ Software Solutions, where innovation meets excellence. Our organization is structured to foster collaboration, creativity, and growth. Below is an overview of our talented team members and their respective roles.\r\n\r\n\n\nExecutive Leadership Team\n\r\n\n\n1. Tim Blare\n\r\n- **Position:** CEO\r\n- **Team:** Executive Leadership\r\n- **Contact:** tim.blare@xyzsoftware.com | (555) 123-4567\r\n\r\n\n\n2. Samantha Barton\n\r\n- **Position:** Chief Technology Officer (CTO)\r\n- **Team:** Executive Leadership\r\n- **Contact:** saman

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Added 4 chunks to collection::
[IndexNode(id_='e30db339-ef0a-4b32-9ca5-26018687c537', embedding=None, metadata={'file_path': 'data\\org-chart.md', 'file_name': 'org-chart.md', 'file_size': 2943, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-16', 'sourceId': 'a578e8eb95df4129b3280558daf79174'}, excluded_embed_metadata_keys=['sourceId'], excluded_llm_metadata_keys=['sourceId'], relationships={}, text='XYZ Software Solutions Organization Chart\n\r\nWelcome to XYZ Software Solutions, where innovation meets excellence. Our organization is structured to foster collaboration, creativity, and growth. Below is an overview of our talented team members and their respective roles.\r\n\r\n\n\nExecutive Leadership Team\n\r\n\n\n1. Tim Blare\n\r\n- **Position:** CEO\r\n- **Team:** Executive Leadership\r\n- **Contact:** tim.blare@xyzsoftware.com | (555) 123-4567\r\n\r\n\n\n2. Samantha Barton\n\r\n- **Position:** Chief Technology Officer (CTO)\r\n- **Team:** Executive Leadership\r\n- **

Or, Create documents using LlamaParse service.

In [None]:
# Install LlamaParse
%pip install llama-parse
# For notebooks only
%pip install nest-asyncio

In [9]:
import os
from dotenv import load_dotenv
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import MarkdownElementNodeParser
# notbooks only
import nest_asyncio;
nest_asyncio.apply()

# Get an api key at https://cloud.llamaindex.ai/login
load_dotenv()
llama_parse_api_key = os.getenv("LLAMA_CLOUD_API_KEY")

# Parse .pdf files using LlamaParse service.
# Any input file format is converted to text/markdown.
# https://github.com/run-llama/llama_parse/tree/main
parser = LlamaParse(
    api_key=llama_parse_api_key,  # can also be set in your env as LLAMA_CLOUD_API_KEY
    result_type="markdown",  # "markdown" and "text" are available
    num_workers=8,  # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",  # Optionally you can define a language, default=en
)

# ----------------
# Chunking methods
# ----------------

# sync
# documents = parser.load_data(source_file_path)

# sync batch
# documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])

# async
# documents = await parser.aload_data(source_file_path)

# async batch
# documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])

# You can also use it with SimpleDirectoryReader
source_file_path = "./data/META-Q1-2024-Earnings-Presentation.pdf"
# file_extractor = {".pdf": llama_parser}
# documents = SimpleDirectoryReader(
#     input_files=[source_file_path],
#     file_extractor=file_extractor,
# ).load_data()
documents = parser.load_data(source_file_path) # or aload_data

print(f"\nAdded {len(documents)} chunks to collection.")
for d in documents:
    print(f"\n\nDocument--{d.id_}::\n{d.text}\n{d.get_content()}\n{d.metadata}")

# Parse the documents using MarkdownElementNodeParser
# node_parser = MarkdownElementNodeParser(llm=lcpp_llm, num_workers=8)

# Retrieve nodes (text) and objects (table)
# nodes = node_parser.get_nodes_from_documents(documents)
# base_nodes, objects = node_parser.get_nodes_and_objects(nodes)

# print(f"\n\nRetrieved nodes::\n{nodes}")
# print(f"\n\nRetrieved objects::\n{base_nodes}\n\n{objects}")

Started parsing the file under job_id 3cd57625-4d9e-4f57-a5b7-0c826f2d66b6

Added 1 chunks to collection.


Document--74a096f6-f247-4333-b170-aeaae25f8fcb::
NO_CONTENT_HERE
---
| |Q1'22|Q2'22|Q3'22|Q4'22|Q1'23|Q2'23|Q3'23|Q4'23|Q1'24|
|---|---|---|---|---|---|---|---|---|---|
|US & Canada|$38,706| | | | | |$33,643|$4,447|$35,635|
|Asia-Pacific| | | | |$31,254|$31,498|$4,137|$7,316|$4,519|
|Europe|$26,998|$28,152|$27,237| |$3,377|$28,101|$3,664| |$7,338|
|Rest of World|$2,949|$3,169|$3,047| |$5,968|$3,229|$6,435|$6,829| |
| |$5,661|$5,835|$5,717|$6,904|$5,893|$7,268|$7,721|$9,159|$8,327|
| |$6,364|$6,360|$5,707|$6,269| | | | |Rest of World|
| |$12,024|$12,788|$12,766| |$15,005|$12,710|$14,131|$14,956|$17,784|$15,451|Asia-Pacific|

Our revenue by user geography is
---
|Revenue by User Geography|
|---|
|In Millions|
|$40,111|
|$34,146|$4,573|$36,455|
|$32,165|$31,999|$4,251|$7,512|$4,667|
|$27,908|$28,822|$27,714|$3,429|$28,645|$3,739|$7,481|
|$2,992|$3,213|$3,100|$6,050|$3,292|$6,515|$6,

Or, Create document chunks with a simple data loader.

In [12]:
from llama_index.core import SimpleDirectoryReader

# load all files in a given directory
# documents = SimpleDirectoryReader("./data").load_data()

# load documents from specific file(s)
documents = SimpleDirectoryReader(
    input_files=["./data/device-list.md"],
).load_data()

print(f"num docs: {len(documents)}\ndocuments::\n{documents}")

num docs: 4
documents::
[Document(id_='34d6d2a9-7d06-4db7-9435-8236b6dfd0e5', embedding=None, metadata={'file_path': 'data\\device-list.md', 'file_name': 'device-list.md', 'file_size': 1211, 'creation_date': '2024-04-15', 'last_modified_date': '2024-03-11'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nHardware Inventory for Smart TV Team\n\r\nThis is an ongoing documentation of all devices in use by the team. We track what device and model each team member has in their possession.\r\n\r\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='3af6b6e3-20c5-4138-95ed-020555a43463', embedding=None, metadata={'file_path': 'data\\device-list.md

Or, Create document chunks manually (using example data).

In [None]:
from llama_index.core import Document

# https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents/

# Create a document and reference all chunks added
document_text = "This is some example text for testing RAG functionality."
document = Document(text=document_text)
# Record the source id in each chunk
document.metadata["sourceId"] = source_unique_id
# Tell query engine to ignore this metadata key
document.excluded_llm_metadata_keys.append("sourceId")
document.excluded_embed_metadata_keys.append("sourceId")

# You can also quickly create a document using some default text
# document = Document.example()

# Add chunks of document file
documents = [document]

print(f"Added chunks to collection::\n{document}")

In [69]:
# set metadata on each chunk
for chunk in documents:
    # Record the source's id in each chunk
    chunk.metadata["sourceId"] = source_unique_id
    # Tell query engine to ignore this metadata key
    chunk.excluded_llm_metadata_keys.append("sourceId")
    chunk.excluded_embed_metadata_keys.append("sourceId")

print(f"All chunks::\n{documents}")

All chunks::
[Document(id_='339ee735-4758-4b1b-b1aa-976f2ffc9eea', embedding=None, metadata={'file_path': 'data\\device-list.md', 'file_name': 'device-list.md', 'file_size': 1211, 'creation_date': '2024-04-15', 'last_modified_date': '2024-03-11', 'sourceId': '9fa741c2d17d44d9a8aa7cfb648b13fd'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date', 'sourceId'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date', 'sourceId'], relationships={}, text='\n\nHardware Inventory for Smart TV Team\n\r\nThis is an ongoing documentation of all devices in use by the team. We track what device and model each team member has in their possession.\r\n\r\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='2a9f7e06-f1b4-4c14-9520-58ef975d5113',

Create a record for each source document. This will be added to the collection's `sources` list. Each source will track its `chunkIds` as well as other metadata for easy reference.

In [12]:
# record the ids of each chunk
chunks_ids = []
for chunk in chunk_nodes: # documents
    chunks_ids.append(chunk.node_id) # or ref_doc_id or node_id ?
    print(f"chunk id: {chunk.node_id}\n")

# create a document object to store metadata
source = dict(
    id=source_unique_id,
    checksum="", # the hash of the parsed file
    fileType="", # type of the source (ingested) file
    filePath="", # path to parsed file
    fileName="", # name of parsed file
    fileSize=0, # bytes
    name="", # document name
    description="Summarization of source contents.",
    createdAt="",
    modifiedLast="",
    chunkIds=chunks_ids,
)

print(f"\nids::\n{chunks_ids}\n")
print(f"source::\n{source}")

chunk id: e30db339-ef0a-4b32-9ca5-26018687c537

chunk id: 82c4f628-0501-4788-b4dd-f1cb964e5990

chunk id: d10b398b-6041-4f1d-a47c-537c22d4ae77

chunk id: 3f783068-7571-45bc-aec6-1f4c2901e032


ids::
['e30db339-ef0a-4b32-9ca5-26018687c537', '82c4f628-0501-4788-b4dd-f1cb964e5990', 'd10b398b-6041-4f1d-a47c-537c22d4ae77', '3f783068-7571-45bc-aec6-1f4c2901e032']

source::
{'id': 'a578e8eb95df4129b3280558daf79174', 'checksum': '', 'fileType': '', 'filePath': '', 'fileName': '', 'fileSize': 0, 'name': '', 'description': 'Summarization of source contents.', 'createdAt': '', 'modifiedLast': '', 'chunkIds': ['e30db339-ef0a-4b32-9ca5-26018687c537', '82c4f628-0501-4788-b4dd-f1cb964e5990', 'd10b398b-6041-4f1d-a47c-537c22d4ae77', '3f783068-7571-45bc-aec6-1f4c2901e032']}



To use Chroma to store the embeddings from a VectorStoreIndex, you need to:

- initialize the Chroma client
- create a Collection to store your data in Chroma
- assign Chroma as the vector_store in a StorageContext
- initialize your VectorStoreIndex using that StorageContext

In [13]:
import json
import chromadb
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore

# initialize client, setting path to save data
db = chromadb.PersistentClient(path=vector_store_path)

# create metadata
tags = [] # List[str]
sources = [] # List[dict]
collection_metadata = dict(
    tags=json.dumps(tags),
    description="Summarization of collection contents.",
    sources=json.dumps(sources),
)

# create collection
chroma_collection = db.get_or_create_collection(
    name=collection_name,
    metadata=collection_metadata,
    # embedding_function=embedding_function, # set via Settings
)

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create a index from Document[]
# index = VectorStoreIndex.from_documents(
#     documents=documents,
#     # transformations=[text_splitter], # optional
#     storage_context=storage_context,
#     show_progress=True,
# )

# create a index from IndexNodes[]
index = VectorStoreIndex(
    nodes=chunk_nodes,
    storage_context=storage_context,
    show_progress=True,
    # embed_model=embed_model, # from Settings
)
# index.storage_context.persist(f"./{out_path}")

Generating embeddings:   0%|          | 0/4 [00:00<?, ?it/s]

If documents added successfully, lets also add the source metadata to our collection's `metadata.sources` list.

In [14]:
import json

# get current collection's sources
print(f"current collection::\n{chroma_collection}")
prev_sources_json = chroma_collection.metadata.get("sources")
prev_sources = json.loads(prev_sources_json)
print(f" \nprev_sources::\n{prev_sources_json}")

# create new sources
new_sources = json.dumps([*prev_sources, source])
print(f"\nnew sources::\n{new_sources}")

# update the collection's metadata
new_collection_metadata = chroma_collection.metadata
new_collection_metadata["sources"] = new_sources
chroma_collection.modify(metadata=new_collection_metadata)
print(f"\nupdated collection::\n{chroma_collection}")

current collection::
name='quickstart' id=UUID('95fb018a-cc38-4792-afae-aaca25b64242') metadata={'description': 'Summarization of collection contents.', 'sources': '[]', 'tags': '[]'} tenant='default_tenant' database='default_database'
 
prev_sources::
[]

new sources::
[{"id": "a578e8eb95df4129b3280558daf79174", "checksum": "", "fileType": "", "filePath": "", "fileName": "", "fileSize": 0, "name": "", "description": "Summarization of source contents.", "createdAt": "", "modifiedLast": "", "chunkIds": ["e30db339-ef0a-4b32-9ca5-26018687c537", "82c4f628-0501-4788-b4dd-f1cb964e5990", "d10b398b-6041-4f1d-a47c-537c22d4ae77", "3f783068-7571-45bc-aec6-1f4c2901e032"]}]

updated collection::
name='quickstart' id=UUID('95fb018a-cc38-4792-afae-aaca25b64242') metadata={'description': 'Summarization of collection contents.', 'sources': '[{"id": "a578e8eb95df4129b3280558daf79174", "checksum": "", "fileType": "", "filePath": "", "fileName": "", "fileSize": 0, "name": "", "description": "Summarization

If you've already created and stored your embeddings, you'll want to load them directly without needing to load your documents or creating a new VectorStoreIndex:

In [8]:
import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# initialize client
db = chromadb.PersistentClient(path=vector_store_path)

# get collection
chroma_collection = db.get_or_create_collection(
    name=collection_name
    # embedding_function=embedding_function, # from Settings
)

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store,
    storage_context=storage_context,
)

If you've already created/loaded an index, you can add new documents to your index using the insert method.

In [19]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=["./data/org-chart.md"]).load_data()
print(f"num docs loaded: {len(documents)}\n{documents}\n")

# Add new documents
for doc in documents:
    index.insert(doc)

num docs loaded: 19
[Document(id_='294164c0-b57d-4c2a-83c6-6914a979b959', embedding=None, metadata={'file_path': 'data\\org-chart.md', 'file_name': 'org-chart.md', 'file_size': 2943, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-16'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nXYZ Software Solutions Organization Chart\n\r\nWelcome to XYZ Software Solutions, where innovation meets excellence. Our organization is structured to foster collaboration, creativity, and growth. Below is an overview of our talented team members and their respective roles.\r\n\r\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), Document(id_='0c4c53fb-8795-4

Lookup a document and its' chunks from a specified collection.

In [16]:
# get sources
sources_json = chroma_collection.metadata.get("sources")
sources = json.loads(sources_json)

# gather all chunks for each source
for s in sources:
    s_id = s.get("id")
    print(f"source::{s_id}")
    chunk_ids = s.get("chunkIds")
    for chunk_id in chunk_ids:
        print(f"chunk_id::{chunk_id}")
    # get all chunks
    doc_chunks = chroma_collection.get(
        where={"sourceId": s_id}, # find by source id
        # where={"ref_doc_id": {"$in": chunk_ids}}, # or find by ref_doc_id
        # ids=ids, # or find by ids instead of "where"
        # include=["metadatas", "documents"], # these are included by default
    )
    # create a list of chunk results
    results = []
    doc_chunk_ids = doc_chunks["ids"]
    for i, chunk_id in enumerate(doc_chunk_ids):
        chunk_text = doc_chunks["documents"][i]
        chunk_metadata = doc_chunks["metadatas"][i]
        result = dict(
            id=chunk_id,
            text=chunk_text,
            metadata=chunk_metadata,
        )
        results.append(result)

    # return all chunks for this source
    print(f"\nsource chunk ids::{doc_chunk_ids}")
    print(f"\nnum chunks:{len(results)}\n\nchunks::\n{results}")
    s["chunks"] = results

# return all sources for this collection
print(f"\nnum sources:{len(sources)}\n\nsources::{sources}")

source::a578e8eb95df4129b3280558daf79174
chunk_id::e30db339-ef0a-4b32-9ca5-26018687c537
chunk_id::82c4f628-0501-4788-b4dd-f1cb964e5990
chunk_id::d10b398b-6041-4f1d-a47c-537c22d4ae77
chunk_id::3f783068-7571-45bc-aec6-1f4c2901e032

source chunk ids::['3f783068-7571-45bc-aec6-1f4c2901e032', '82c4f628-0501-4788-b4dd-f1cb964e5990', 'd10b398b-6041-4f1d-a47c-537c22d4ae77', 'e30db339-ef0a-4b32-9ca5-26018687c537']

num chunks:4

chunks::
[{'id': '3f783068-7571-45bc-aec6-1f4c2901e032', 'text': 'Kevin Wilson\n\r\n- **Position:** Director of Customer Support\r\n- **Team:** Customer Support\r\n- **Contact:** kevin.wilson@xyzsoftware.com | (555) 234-5678\r\n\r\n\n\n13. Amanda Clark\n\r\n- **Position:** Customer Support Specialist\r\n- **Team:** Customer Support\r\n- **Contact:** amanda.clark@xyzsoftware.com | (555) 345-6789', 'metadata': {'_node_content': '{"id_": "3f783068-7571-45bc-aec6-1f4c2901e032", "embedding": null, "metadata": {"file_path": "data\\\\org-chart.md", "file_name": "org-chart.md",

Get all collection names

In [13]:
collections_list = db.list_collections()
for coll in collections_list:
    print(coll.name)
    fart = db.get_collection(coll.name)
    print(fart)
print(collections_list)

quickstart
name='quickstart' id=UUID('95fb018a-cc38-4792-afae-aaca25b64242') metadata={'description': 'Summarization of collection contents.', 'sources': '[{"id": "a578e8eb95df4129b3280558daf79174", "checksum": "", "fileType": "", "filePath": "", "fileName": "", "fileSize": 0, "name": "", "description": "Summarization of source contents.", "createdAt": "", "modifiedLast": "", "chunkIds": ["e30db339-ef0a-4b32-9ca5-26018687c537", "82c4f628-0501-4788-b4dd-f1cb964e5990", "d10b398b-6041-4f1d-a47c-537c22d4ae77", "3f783068-7571-45bc-aec6-1f4c2901e032"]}]', 'tags': '[]'} tenant='default_tenant' database='default_database'
[Collection(name=quickstart)]


Build prompts.

In [13]:
from llama_index.core import PromptTemplate

# Gracefully handle failed retrievals
refine_template_str = (
    "The original question is as follows: {query_str}\nWe have provided an"
    " existing answer: {existing_answer}\nWe have the opportunity to refine"
    " the existing answer (only if needed) with some more context"
    " below.\n------------\n{context_str}\n------------\nUsing both the new"
    " context and your own knowledge, update or repeat the existing answer.\n"
)
custom_refine_prompt = PromptTemplate(refine_template_str)

# Build Prompt
prompt_str = "Tell me the names of the team members who own LG devices."

SYSTEM_PROMPT = """You are an AI assistant that answers questions in a friendly manner, based on the given source documents. Here are some rules you always follow:
- Generate human readable output, avoid creating output with gibberish text.
- Generate only the requested output, don't include any other language before or after the requested output.
- Never say thank you, that you are happy to help, that you are an AI agent, etc. Just answer directly.
- Generate professional language typically used in business documents in North America.
- Never generate offensive or foul language.
"""

# Build template
def custom_prompt_template():
    # PROMPT_TEMPLATE = f"SYSTEM: {SYSTEM_PROMPT}\n\nUser: {query_str}\n\nASSISTANT:\n"
    PROMPT_TEMPLATE = (
        "SYSTEM: We have provided context information below.\n"
        "---------------------\n"
        "{context_str}"
        "\n---------------------\n"
        "Given this information, please answer the question:\n\n"
        "User: {query_str}\n\n"
        "ASSISTANT:\n"
    )
    return PromptTemplate(
        template=PROMPT_TEMPLATE,
        prompt_type="custom_default",
    )

Create engine to handle queries.

In [14]:
# create a query engine and query
similarity_top_k = 3
response_mode = "tree_summarize"
query_engine = index.as_query_engine(
    llm=lcpp_llm,
    text_qa_template=custom_prompt_template(),
    similarity_top_k=similarity_top_k,
    response_mode=response_mode,
    refine_template=custom_refine_prompt,
)

Ask some questions here...

In [15]:
# device-list.md
response1 = query_engine.query("How many device models are there in total? Think carefully through your decision and then list each model name. Then explain your reasoning.")
print(f"response1::\n{response1}\n")

response2 = query_engine.query("Tell me how many team members own LG devices? Explain your reasoning.")
print(f"response2::\n{response2}")

# org-chart.md
# response3 = query_engine.query("Give me contact info for Jessica Carter.")
# print(f"response3::\n{response3}\n")

# response4 = query_engine.query("How many employees of company 'XYZ Software Solutions' belong to engineering team? What are their names?")
# print(f"response4::\n{response4}\n")

# response5 = query_engine.query("I have a very important decision that requires input from someone from company 'XYZ Software Solutions' in QA about a recent change engineering made. Who is the most appropriate person I can reach out to?")
# print(f"response5::\n{response5}\n")

Number of requested results 3 is greater than number of elements in index 2, updating n_results = 2
Number of requested results 3 is greater than number of elements in index 2, updating n_results = 2


response1::
 There are a total of 4 different device models listed in the provided data: Xbox, LG, Vizio, and FPP.
Reasoning: Based on the information provided, there are four distinct device models mentioned: Xbox (with two different model names), LG (also with two different model names), Vizio, and FPP. These are the only models listed in the data, so there must be a total of 4 different devices models present.



Llama.generate: prefix-match hit


response2::
 Based on the provided data, there are two team members who own LG devices: Bob Johnson and Danny Cantor. This can be inferred from the "Hardware Inventory for Smart TV Team" section of the document, which lists every type of hardware platform currently in use by the team, including LG (under the "Type" column). Within this list, there are two models of LG devices: DEF and FPP. Therefore, two team members own at least one LG device each.


To unload the local model from `llama-index`:

In [31]:
lcpp_llm = None
del lcpp_llm
try:
    print(f"Try to read {lcpp_llm}")
except:
    print("Model has been unloaded.")


Model has been unloaded.
